Module: SmsBackupRenderer
- Defined in:
- lib/sms_backup_renderer.rb,
lib/sms_backup_renderer/models.rb,
lib/sms_backup_renderer/parser.rb,
lib/sms_backup_renderer/version.rb,
lib/sms_backup_renderer/renderer.rb
Defined Under Namespace
Classes: BasePage, ConversationPage, ImagePart, IndexPage, MediaPart, Message, MessagePart, Participant, TextPart, UnsupportedPart, VideoPart
Constant Summary collapse
- HIGH_SURROGATES =
0xD800..0xDFFF
- VERSION =
"0.1.2"
Class Method Summary collapse
-
.fix_surrogate_pairs(string) ⇒ Object
Although the files claim to be UTF-8, SMS Backup & Restore produces files that incorrectly represent characters such as emoji using surrogate pairs, such that a single character is represented by two adjacent, separately-escaped characters which are supposed to be interpreted as a single Unicode surrogate pair.
- .generate_html_from_archive(input_file_path, output_dir_path) ⇒ Object
-
.mms_address_contact_names(mms) ⇒ Object
Build a hash of normalized addresses to contact names using information in an MMS XML record.
- .mms_outgoing_type?(type) ⇒ Boolean
- .mms_sender_addr_type?(type) ⇒ Boolean
- .parse(input, data_dir_path) ⇒ Object
-
.parse_mms_combined_address(address_attribute) ⇒ Object
The XML for MMSes contains an ‘address’ attribute containing a list of addresses separated by tildes.
- .sms_outgoing_type?(type) ⇒ Boolean
Class Method Details
.fix_surrogate_pairs(string) ⇒ Object
Although the files claim to be UTF-8, SMS Backup & Restore produces files that incorrectly represent characters such as emoji using surrogate pairs, such that a single character is represented by two adjacent, separately-escaped characters which are supposed to be interpreted as a single Unicode surrogate pair. Nokogiri crashes when it encounters these, since it tries to interpret each part of the pair as a separate character. This method is a hacky workaround that simply searches the whole file for strings that look like escaped surrogate pairs and replaces them with the literal character they represent.
16 17 18 19 20 21 22 23 24 25 26 27 |
# File 'lib/sms_backup_renderer/parser.rb', line 16 def self.fix_surrogate_pairs(string) string.gsub!(/\&\#(\d{5})\;\&\#(\d{5})\;/) do |match| high = Regexp.last_match[1].to_i if HIGH_SURROGATES.include?(high) low = Regexp.last_match[2].to_i code_point = ((high - 0xD800) << 10) + (low - 0xDC00) + 0x010000 [code_point].pack('U*') else match[0] end end end |
.generate_html_from_archive(input_file_path, output_dir_path) ⇒ Object
10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 |
# File 'lib/sms_backup_renderer.rb', line 10 def self.generate_html_from_archive(input_file_path, output_dir_path) input_tempfile = Tempfile.new('sms_backup_renderer') input_text = File.read(input_file_path) SmsBackupRenderer.fix_surrogate_pairs(input_text) File.write(input_tempfile.path, input_text) data_dir_path = File.join(output_dir_path, 'data') FileUtils.mkdir_p(data_dir_path) input_file = File.open(input_tempfile.path) = SmsBackupRenderer.parse(input_file, data_dir_path) input_file.close input_tempfile.close = .group_by {|m| m.participants.reject(&:owner).map(&:normalized_address).sort}.values assets_dir_path = File.join(output_dir_path, 'assets') FileUtils.cp_r(File.join(File.dirname(__FILE__), 'sms_backup_renderer', 'assets'), output_dir_path) conversations_dir_path = File.join(output_dir_path, 'conversations') FileUtils.mkdir_p(conversations_dir_path) conversation_pages = .map do || filename = ConversationPage.build_filename(.first.participants.reject(&:owner)) path = File.join(conversations_dir_path, filename) SmsBackupRenderer::ConversationPage.new(path, assets_dir_path, ) end conversation_pages.each(&:write) SmsBackupRenderer::IndexPage.new( File.join(output_dir_path, 'index.html'), assets_dir_path, conversation_pages).write end |
.mms_address_contact_names(mms) ⇒ Object
Build a hash of normalized addresses to contact names using information in an MMS XML record. The data in the archive does not provide any explicit mapping of addresses to contact names, but at least for me it seems like the tilde-separated address attribute and the comma-separated contact_name attribute are provided in the same order, so we can try to use those to build a mapping. Obviously, this is error-prone, but seems better than nothing.
mms - nokogiri object representing the MMS element
Returns a Hash of String normalized addresses to String contact names.
145 146 147 148 149 150 151 152 153 154 |
# File 'lib/sms_backup_renderer/parser.rb', line 145 def self.mms_address_contact_names(mms) addresses = parse_mms_combined_address(mms.attr('address')) contact_names = mms.attr('contact_name').split(',').map(&:strip) # There may be more addresses than contact names. It seems like the addresses for unknown contacts # are placed at the end of the list. We'll omit them from the hash. addresses = addresses.take(contact_names.count) addresses.zip(contact_names).to_h end |
.mms_outgoing_type?(type) ⇒ Boolean
116 117 118 119 120 121 122 123 124 125 |
# File 'lib/sms_backup_renderer/parser.rb', line 116 def self.mms_outgoing_type?(type) case type when '132' false when '128' true else raise "Unrecognized MMS m_type #{type}" end end |
.mms_sender_addr_type?(type) ⇒ Boolean
127 128 129 130 131 132 133 134 |
# File 'lib/sms_backup_renderer/parser.rb', line 127 def self.mms_sender_addr_type?(type) case type when '137' true else false end end |
.parse(input, data_dir_path) ⇒ Object
29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/sms_backup_renderer/parser.rb', line 29 def self.parse(input, data_dir_path) = [] Nokogiri::XML::Reader(input).each do |node| next unless node.node_type == Nokogiri::XML::Reader::TYPE_ELEMENT case node.name when 'sms' sms = Nokogiri::XML(node.outer_xml).at('/sms') outgoing = sms_outgoing_type?(sms.attr('type')) << Message.new( date_time: Time.strptime(sms.attr('date'), '%Q'), parts: sms.attr('body') ? [TextPart.new(sms.attr('body'))] : [], outgoing: outgoing, participants: [Participant.new( address: sms.attr('address'), name: sms.attr('contact_name'), owner: false, sender: !outgoing)], subject: sms.attr('subject')) when 'mms' mms = Nokogiri::XML(node.outer_xml).at('/mms') unless ['null', 'application/vnd.wap.multipart.related', 'application/vnd.wap.multipart.mixed'].include?(mms.attr('ct_t')) raise "Unrecognized MMS ct_t #{mms.attr('ct_t')}" end parts = mms.xpath('parts/part').map do |part| case part.attr('ct') when 'application/smil' # should probably use this, but I think I can get by without it nil when 'text/plain' TextPart.new(part.attr('text')) when /\Aimage\/(.+)\z/ data = Base64.decode64(part.attr('data')) digest = Digest::MD5.hexdigest(data) path = File.join(data_dir_path, "#{digest}.#{$1}") File.write(path, data) ImagePart.new(part.attr('ct'), path) when /\Avideo\/(.+)\z/ data = Base64.decode64(part.attr('data')) digest = Digest::MD5.hexdigest(data) path = File.join(data_dir_path, "#{digest}.#{$1}") File.write(path, data) VideoPart.new(part.attr('ct'), path) else UnsupportedPart.new(part.to_xml) end end.compact non_owner_addresses = parse_mms_combined_address(mms.attr('address')) address_contact_names = mms_address_contact_names(mms) participants = mms.xpath('addrs/addr').map do |addr| Participant.new( address: addr.attr('address'), name: address_contact_names[Participant.normalize_address(addr.attr('address'))], owner: !non_owner_addresses.include?(Participant.normalize_address(addr.attr('address'))), sender: mms_sender_addr_type?(addr.attr('type'))) end # Some messages include the sender as a recipient as well; we don't want Participants # for those recipients since it would interfere with proper conversation grouping. if sender = participants.detect(&:sender) participants.delete_if { |p| !p.sender && p.normalized_address == sender.normalized_address} end << Message.new( date_time: Time.strptime(mms.attr('date'), '%Q'), outgoing: mms_outgoing_type?(mms.attr('m_type')), participants: participants, parts: parts) end end end |
.parse_mms_combined_address(address_attribute) ⇒ Object
The XML for MMSes contains an ‘address’ attribute containing a list of addresses separated by tildes. Although there are also separate ‘addr’ elements for each address, the combined attribute can be useful because it appears to exclude the owner of the archive’s address, and because the order can be correlated with the contact_name attribute.
address_attribute - the value of the ‘address’ attribute from the XML element for the MMS message
Returns an Array of String normalized addresses.
164 165 166 |
# File 'lib/sms_backup_renderer/parser.rb', line 164 def self.parse_mms_combined_address(address_attribute) address_attribute.split('~').map {|a| Participant.normalize_address(a)} end |
.sms_outgoing_type?(type) ⇒ Boolean
105 106 107 108 109 110 111 112 113 114 |
# File 'lib/sms_backup_renderer/parser.rb', line 105 def self.sms_outgoing_type?(type) case type when '1' false when '2' true else raise "Unrecognized SMS type #{type}" end end |