Class: Mapi::Pst::BlockParser

Inherits:
Object
  • Object
show all
Includes:
Types::Constants
Defined in:
lib/mapi/pst.rb

Overview

the job of this class, is to take a desc record, and be able to enumerate through the mapi properties of the associated thing.

corresponds to

  • _pst_parse_block

  • _pst_process (in some ways. although perhaps thats more the Item::Properties#add_property)

Constant Summary collapse

TYPES =
{
	0xbc => 1,
	0x7c => 2,
	# type 3 is removed. an artifact of not handling the indirect blocks properly in libpst.
}
PR_SUBJECT =
PropertySet::TAGS.find { |num, (name, type)| name == 'PR_SUBJECT' }.first.hex
PR_BODY_HTML =
PropertySet::TAGS.find { |num, (name, type)| name == 'PR_BODY_HTML' }.first.hex
IMMEDIATE_TYPES =
[
	PT_SHORT, PT_LONG, PT_BOOLEAN
]
INDIRECT_TYPES =
[
	PT_DOUBLE, PT_OBJECT,
	0x0014, # whats this? probably something like PT_LONGLONG, given the correspondence with the
					# ole variant types. (= VT_I8)
	PT_STRING8, PT_UNICODE, # unicode isn't in libpst, but added here for outlook 2003 down the track
	PT_SYSTIME,
	0x0048, # another unknown
	0x0102, # this is PT_BINARY vs PT_CLSID
	#0x1003, # these are vector types, but they're commented out for now because i'd expect that
	#0x1014, # there's extra decoding needed that i'm not doing. (probably just need a simple
	#        # PT_* => unpack string mapping for the immediate types, and just do unpack('V*') etc
	#0x101e,
	#0x1102
]
ID2_ATTACHMENTS =
0x671
ID2_RECIPIENTS =
0x692
USE_MAIN_DATA =

Targeting main data, not sub

-1

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(node, local_node_id = USE_MAIN_DATA) ⇒ BlockParser

Returns a new instance of BlockParser.

Parameters:

  • node (NodePtr)
  • local_node_id (Integer) (defaults to: USE_MAIN_DATA)


1002
1003
1004
1005
1006
1007
1008
1009
1010
1011
1012
1013
1014
1015
1016
1017
1018
1019
# File 'lib/mapi/pst.rb', line 1002

def initialize node, local_node_id = USE_MAIN_DATA
	#raise FormatError, "unable to get associated index record for #{node.inspect}" unless node.block
	@node = node
	@data_chunks = {}

	data_array = (local_node_id == USE_MAIN_DATA) ? node.read_main_array : (node.read_sub_array local_node_id)

	data_array.each_with_index { |data, index|
		# see https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/a3fa280c-eba3-434f-86e4-b95141b3c7b1
		if index == 0
			load_root_header data
		else
			load_page_header data, index
		end
	}

	# now, we may have multiple different blocks
end

Instance Attribute Details

#data_chunksHash<Integer, String> (readonly)

Returns HID to data block.

Returns:

  • (Hash<Integer, String>)

    HID to data block



998
999
1000
# File 'lib/mapi/pst.rb', line 998

def data_chunks
  @data_chunks
end

#nodeNodePtr (readonly)

Returns:



994
995
996
# File 'lib/mapi/pst.rb', line 994

def node
  @node
end

Instance Method Details

#get_data_array(offset) ⇒ Array<String>

Parameters:

  • offset (Integer)

Returns:

  • (Array<String>)


1107
1108
1109
1110
1111
1112
1113
1114
1115
1116
1117
1118
1119
# File 'lib/mapi/pst.rb', line 1107

def get_data_array offset
	raise "offset must be Integer" unless Integer === offset

	if offset == 0
		nil
	elsif (offset & 0x1f) != 0
		# this is NID (node)
		node.read_sub_array(offset)
	else
		# this is HID (heap)
		[data_chunks[offset]]
	end
end

#get_data_indirect(offset) ⇒ String

based on the value of offset, return either some data from buf, or some data from the id2 chain id2, where offset is some key into a lookup table that is stored as the id2 chain. i think i may need to create a BlockParser class that wraps up all this mess.

corresponds to:

  • _pst_getBlockOffsetPointer

  • _pst_getBlockOffset

Parameters:

  • offset (Integer)

Returns:

  • (String)


1072
1073
1074
1075
1076
# File 'lib/mapi/pst.rb', line 1072

def get_data_indirect offset
	raise "offset must be Integer" unless Integer === offset

	return get_data_indirect_io(offset).read
end

#get_data_indirect_io(offset) ⇒ StringIO

Resolve data pointed by HNID



1084
1085
1086
1087
1088
1089
1090
1091
1092
1093
1094
1095
1096
1097
1098
1099
1100
1101
1102
# File 'lib/mapi/pst.rb', line 1084

def get_data_indirect_io offset
	raise "offset must be Integer" unless Integer === offset

	if offset == 0
		nil
	elsif (offset & 0x1f) != 0
		# this is NID (node)
		data_array = node.read_sub_array(offset)
		raise "local node id #{offset} points multi page count #{data_array.count}, use get_data_array() instead" if data_array.count >= 2
		if data_array.empty?
			StringIO.new ""
		else
			StringIO.new data_array.first
		end
	else
		# this is HID (heap)
		StringIO.new data_chunks[offset]
	end
end

#handle_indirect_values(key, type, value) ⇒ Object



1121
1122
1123
1124
1125
1126
1127
1128
1129
1130
1131
1132
1133
1134
1135
1136
1137
1138
1139
1140
1141
1142
1143
1144
1145
1146
1147
1148
1149
1150
1151
1152
1153
1154
1155
1156
1157
1158
1159
1160
1161
1162
1163
1164
1165
1166
1167
1168
1169
1170
1171
1172
1173
1174
1175
1176
1177
1178
1179
1180
1181
1182
1183
1184
1185
1186
1187
1188
1189
1190
1191
1192
1193
1194
1195
1196
1197
1198
1199
1200
1201
1202
1203
1204
1205
1206
1207
1208
1209
1210
1211
1212
1213
1214
1215
1216
1217
1218
1219
1220
1221
1222
1223
1224
1225
1226
1227
1228
1229
1230
1231
1232
1233
1234
1235
1236
1237
1238
1239
1240
1241
1242
1243
1244
1245
1246
1247
1248
1249
1250
1251
1252
1253
1254
1255
1256
1257
1258
1259
1260
1261
# File 'lib/mapi/pst.rb', line 1121

def handle_indirect_values key, type, value
	case type
	when PT_BOOLEAN
		value = value != 0
	when *IMMEDIATE_TYPES # not including PT_BOOLEAN which we just did above
		# no processing current applied (needed?).
	when *INDIRECT_TYPES
		# the value is a pointer
		if String === value # ie, value size > 4 above
			value = StringIO.new value
		else
			value = get_data_array(value)
			if value
				value = StringIO.new value.join("")
			end
		end
		# keep strings as immediate values for now, for compatability with how i set up
		# Msg::Properties::ENCODINGS
		if value
			if type == PT_STRING8
				value = node.pst.helper.convert_ansi_str value.read
			elsif type == PT_UNICODE
				value = Ole::Types::FROM_UTF16.iconv value.read
			end
		end
		# special subject handling
		if key == PR_BODY_HTML and value
			# to keep the msg code happy, which thinks body_html will be an io
			# although, in 2003 version, they are 0102 already
			value = StringIO.new value unless value.respond_to?(:read)
		end
		if key == PR_SUBJECT and String === value and value.length >= 2
			if value[0].ord == 1
				# This 2 chars header tell us how to omit subject prefix like `Yes: `, `Re: `, etc.
				# We need not to omit them.
				value = value[2..-1]
			end
=begin
			index = value =~ /^[A-Z]*:/ ? $~[0].length - 1 : nil
			unless ignore == 1 and offset == index
				warn 'something wrong with subject hack' 
				$x = [ignore, offset, value]
				require 'irb'
				IRB.start
				exit
			end
=end
=begin
new idea:

making sense of the \001\00[156] i've seen prefixing subject. i think its to do with the placement
of the ':', or the ' '. And perhaps an optimization to do with thread topic, and ignoring the prefixes
added by mailers. thread topic is equal to subject with all that crap removed.

can test by creating some mails with bizarre subjects.

subject="\001\005RE: blah blah"
subject="\001\001blah blah"
subject="\001\032Out of Office AutoReply: blah blah"
subject="\001\020Undeliverable: blah blah"

looks like it

=end

			# now what i think, is that perhaps, value[offset..-1] ...
			# or something like that should be stored as a special tag. ie, do a double yield
			# for this case. probably PR_CONVERSATION_TOPIC, in which case i'd write instead:
			# yield [PR_SUBJECT, ref_type, value]
			# yield [PR_CONVERSATION_TOPIC, ref_type, value[offset..-1]
			# next # to skip the yield.
		end

		# special handling for embedded objects
		# used for attach_data for attached messages. in which case attach_method should == 5,
		# for embedded object.
		if type == PT_OBJECT and value
			value = value.read if value.respond_to?(:read)
			id2, unknown = value.unpack 'V2'
			io = get_data_indirect_io id2

			# hacky
			#desc2 = OpenStruct.new(:node => io, :pst => node.pst, :sub_block => node.sub_block, :children => [])
			# put nil instead of desc.list_index, otherwise the attachment is attached to itself ad infinitum.
			# should try and fix that FIXME
			# this shouldn't be done always. for an attached message, yes, but for an attached
			# meta file, for example, it shouldn't. difference between embedded_ole vs embedded_msg
			# really.
			# note that in the case where its a embedded ole, you actually get a regular serialized ole
			# object, so i need to create an ole storage object on a rangesioidxchain!
			# eg:
=begin
att.props.display_name # => "Picture (Metafile)"
io = att.props.attach_data
io.read(32).unpack('H*') # => ["d0cf11e0a1b11ae100000.... note the docfile signature.
# plug some missing rangesio holes:
def io.rewind; seek 0; end
def io.flush; raise IOError; end
ole = Ole::Storage.open io
puts ole.root.to_tree

- #<Dirent:"Root Entry">
|- #<Dirent:"\001Ole" size=20 data="\001\000\000\002\000...">
|- #<Dirent:"CONTENTS" size=65696 data="\327\315\306\232\000...">
\- #<Dirent:"\003MailStream" size=12 data="\001\000\000\000[...">
=end
			# until properly fixed, i have disabled this code here, so this will break
			# nested messages temporarily.
			#value = Item.new desc2, RawPropertyStore.new(desc2).to_a
			#desc2.list_index = nil
			value = io
		end
	# this is PT_MV_STRING8, i guess.
	# should probably have the 0x1000 flag, and do the or-ring.
	# example of 0x1102 is PR_OUTLOOK_2003_ENTRYIDS. less sure about that one.
	when 0x101e, 0x1102
		# example data:
		# 0x802b "\003\000\000\000\020\000\000\000\030\000\000\000#\000\000\000BusinessCompetitionFavorites"
		# this 0x802b would be an extended attribute for categories / keywords.
		value = get_data_indirect_io(value).read unless String === value
		num = value.unpack('V')[0]
		offsets = value[4, 4 * num].unpack("V#{num}")
		value = (offsets + [value.length]).to_enum(:each_cons, 2).map { |from, to| value[from...to] }
		value.map! { |str| StringIO.new str } if type == 0x1102
	when 0x101f
		value = get_data_indirect_io(value).read unless String === value
		num = value.unpack('V')[0]
		offsets = value[4, 4 * num].unpack("V#{num}")
		value = (offsets + [value.length]).to_enum(:each_cons, 2).map { |from, to| value[from...to] }
		value.map! { |str| Ole::Types::FROM_UTF16.iconv str }
	when 0x1003 # uint32 array
		value = get_data_indirect_io(value).read unless String === value
		# there is no count field
		value = value.unpack("V#{(value.length / 4)}")
	else
		name = Mapi::Types::DATA[type].first rescue nil
		warn '0x%04x %p' % [key, get_data_indirect_io(value).read]
		raise NotImplementedError, 'unsupported mapi property type - 0x%04x (%p)' % [type, name]
	end
	[key, type, value]
end

#load_page_header(data, page_index) ⇒ Object

Parse HNPAGEHDR / HNBITMAPHDR



1028
1029
1030
1031
1032
1033
1034
1035
1036
1037
1038
1039
# File 'lib/mapi/pst.rb', line 1028

def load_page_header data, page_index
	page_map = data.unpack('v').first

	# read HNPAGEMAP
	offsets_count = data[page_map, 2].unpack("v").first + 1
	offset_tables = data[page_map + 4, 2 * offsets_count].unpack("v#{offsets_count}")

	offset_tables.each_cons(2).to_a.each_with_index do |(from, to), index|
		# conver to HID
		@data_chunks[0x20 * (1 + index) + 65536 * page_index] = data[from, to - from]
	end
end

#load_root_header(data) ⇒ Object

Parse HNHDR



1045
1046
1047
1048
1049
1050
1051
1052
1053
1054
1055
1056
1057
1058
1059
# File 'lib/mapi/pst.rb', line 1045

def load_root_header data
	page_map, sig, @heap_type, @offset1 = data.unpack 'vCCVV'
	raise FormatError, 'invalid signature 0x%02x' % sig unless sig == 0xec
	raise FormatError, 'unknown block type signature 0x%02x' % @heap_type unless TYPES[@heap_type]
	@type = TYPES[@heap_type]

	# read HNPAGEMAP
	offsets_count = data[page_map, 2].unpack("v").first + 1
	offset_tables = data[page_map + 4, 2 * offsets_count].unpack("v#{offsets_count}")

	offset_tables.each_cons(2).to_a.each_with_index do |(from, to), index|
		# conver to HID
		@data_chunks[0x20 * (1 + index)] = data[from, to - from]
	end
end