Class: Pidgin2Adium::BasicParser

Inherits:
Object
  • Object
show all
Includes:
Pidgin2Adium
Defined in:
lib/pidgin2adium/parsers/basic_parser.rb

Overview

BasicParser is a base class. Its subclasses are TextLogParser and HtmlLogParser.

Please use Pidgin2Adium.parse or Pidgin2Adium.parse_and_generate instead of using this class directly.

Direct Known Subclasses

HtmlLogParser, TextLogParser

Constant Summary collapse

MINIMAL_TIME_REGEX =

Minimal times don’t have a date

/^\d{1,2}:\d{1,2}:\d{1,2}(?: [AP]M)?$/
TIME_REGEX_FIRST_LINE =

Time regexes must be set before pre_parse!(). “4/18/2007 11:02:00 AM” => %w18, 2007 ONLY used (if at all) in first line of chat (“Conversation with…at…”)

%r{^(\d{1,2})/(\d{1,2})/(\d{4}) \d{1,2}:\d{2}:\d{2} [AP]M$}
TIME_REGEX =

“2007-04-17 12:33:13” => %w04, 17

/^(\d{4})-(\d{2})-(\d{2}) \d{2}:\d{2}:\d{2}$/

Constants included from Pidgin2Adium

ADIUM_LOG_DIR, BAD_DIRS, FILE_EXISTS, VERSION

Instance Method Summary collapse

Methods included from Pidgin2Adium

balance_tags_c, delete_search_indexes, error, log_msg, oops, parse, parse_and_generate

Constructor Details

#initialize(src_path, user_aliases) ⇒ BasicParser

Returns a new instance of BasicParser.



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 38

def initialize(src_path, user_aliases)
  @src_path = src_path
  # Whitespace is removed for easy matching later on.
  @user_aliases = user_aliases.split(',').map!{|x| x.downcase.gsub(/\s+/,'') }.uniq
  # @user_alias is set each time get_sender_by_alias is called. It is a non-normalized
  # alias.
  # Set an initial value just in case the first message doesn't give
  # us an alias.
  @user_alias = user_aliases.split(',')[0]

  @log_file_is_valid = true
  begin
    file = File.new(@src_path, 'r')
    @first_line = file.readline
    @file_content = file.read
    file.close
  rescue Errno::ENOENT
    oops("#{@src_path} doesn't exist! Continuing...")
    @log_file_is_valid = false
    return nil
  end

  begin
    successfully_set_variables = pre_parse!
    if not successfully_set_variables
      error("Failed to set some key variables: #{@src_path}")
      @log_file_is_valid = false
      return
    end
  rescue InvalidFirstLineError
    # The first line isn't parseable
    @log_file_is_valid = false
    error("Failed to parse, invalid first line: #{@src_path}")
    return # stop processing
  end

  # @status_map, @lib_purple_events, and @events are used in
  # create_status_or_event_msg
  @status_map = {
    /(.+) logged in\.$/ => 'online',
    /(.+) logged out\.$/ => 'offline',
    /(.+) has signed on\.$/ => 'online',
    /(.+) has signed off\.$/ => 'offline',
    /(.+) has gone away\.$/ => 'away',
    /(.+) is no longer away\.$/ => 'available',
    /(.+) has become idle\.$/ => 'idle',
    /(.+) is no longer idle\.$/ => 'available'
  }

  # lib_purple_events are all of event_type libPurple
  @lib_purple_events = [
    # file transfer
    /Starting transfer of .+ from (.+)/,
    /^Offering to send .+ to (.+)$/,
    /(.+) is offering to send file/,
    /^Transfer of file .+ complete$/,
    /Error reading|writing|accessing .+: .+/,
    /You cancell?ed the transfer of/,
    /File transfer cancelled/,
    /(.+?) cancell?ed the transfer of/,
    /(.+?) cancelled the file transfer/,
    # Direct IM - actual (dis)connect events are their own types
    /^Attempting to connect to (.+) at .+ for Direct IM\./,
    /^Asking (.+) to connect to us at .+ for Direct IM\./,
    /^Attempting to connect via proxy server\.$/,
    /^Direct IM with (.+) failed/,
    # encryption
    /Received message encrypted with wrong key/,
    /^Requesting key\.\.\.$/,
    /^Outgoing message lost\.$/,
    /^Conflicting Key Received!$/,
    /^Error in decryption- asking for resend\.\.\.$/,
    /^Making new key pair\.\.\.$/,
    # sending errors
    /^Last outgoing message not received properly- resetting$/,
    /Resending\.\.\./,
    # connection errors
    /Lost connection with the remote user:.+/,
    # chats
    /^.+ entered the room\.$/,
    /^.+ left the room\.$/
  ]

  # non-libpurple events
  # Each key maps to an event_type string. The keys will be matched against a line of chat
  # and the partner's alias will be in regex group 1, IF the alias is matched.
  @event_map = {
    # .+ is not an alias, it's a proxy server so no grouping
    /^Attempting to connect to .+\.$/ => 'direct-im-connect',
    # NB: pidgin doesn't track when Direct IM is disconnected, AFAIK
    /^Direct IM established$/ => 'directIMConnected',
    /Unable to send message/ => 'chat-error',
    /You missed .+ messages from (.+) because they were too large/ => 'chat-error',
    /User information not available/ => 'chat-error'
  }

  @ignore_events = [
    # Adium ignores SN/alias changes.
    /^.+? is now known as .+?\.<br\/?>$/
  ]
end

Instance Method Details

#create_adium_time(time) ⇒ Object

Converts a pidgin datestamp to an Adium one. Returns a string representation of time or nil if it couldn’t parse the provided time.



226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 226

def create_adium_time(time)
  return nil if time.nil?
  if is_minimal_time?(time)
    datetime = try_to_parse_minimal_time(time)
  else
    begin
      datetime = DateTime.parse(time)
    rescue ArgumentError
      datetime = try_to_parse_time(time)
      if datetime.nil?
        Pidgin2Adium.oops("#{time} couldn't be parsed. Please open an issue on GitHub: https://github.com/gabebw/pidgin2adium/issues")
        return nil
      end
    end
  end

  return nil if datetime.nil?

  # Instead of dealing with Ruby 1.9 vs Ruby 1.8, DateTime vs Date vs
  # Time, and #xmlschema vs #iso8601, just use strftime.
  datetime.strftime('%Y-%m-%dT%H:%M:%S%Z')
end

#create_msg(matches) ⇒ Object

– create_msg takes an array of captures from matching against It can be used for TextLogParser and HtmlLogParser because both of they return data in the same indexes in the matches array. ++



355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 355

def create_msg(matches)
  msg = nil
  # Either a regular message line or an auto-reply/away message.
  time = create_adium_time(matches[0])
  return nil if time.nil?
  buddy_alias = matches[1]
  sender = get_sender_by_alias(buddy_alias)
  body = matches[3]
  if matches[2] # auto-reply
    msg = AutoReplyMessage.new(sender, time, buddy_alias, body)
  else
    # normal message
    msg = XMLMessage.new(sender, time, buddy_alias, body)
  end
  return msg
end

#create_status_or_event_msg(matches) ⇒ Object

– create_status_or_event_msg takes an array of MatchData captures from matching against @line_regex_status and returns an Event or Status. Returns nil if it’s a message that should be ignored, or false if an error occurred. ++



378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 378

def create_status_or_event_msg(matches)
  # ["22:58:00", "BuddyName logged in."]
  # 0: time
  # 1: status message or event
  msg = nil
  time = create_adium_time(matches[0])
  return nil if time.nil?
  str = matches[1]
  # Return nil, which will get compact'ed out
  return nil if @ignore_events.detect{|regex| str =~ regex }

  regex, status = @status_map.detect{|rxp, stat| str =~ rxp}
  if regex and status
    # Status message
    buddy_alias = regex.match(str)[1]
    sender = get_sender_by_alias(buddy_alias)
    msg = StatusMessage.new(sender, time, buddy_alias, status)
  else
    # Test for event
    regex = @lib_purple_events.detect{|rxp| str =~ rxp }
    event_type = 'libpurpleEvent' if regex
    unless regex and event_type
      # not a libpurple event, try others
      regex, event_type = @event_map.detect{|rxp,ev_type| str =~ rxp}
      unless regex and event_type
        error(sprintf("Error parsing status or event message, no status or event found: %p", str))
        return false
      end
    end

    if regex and event_type
      regex_matches = regex.match(str)
      # Event message
      if regex_matches.size == 1
        # No alias - this means it's the user
        buddy_alias = @user_alias
        sender = @user_SN
      else
        buddy_alias = regex_matches[1]
        sender = get_sender_by_alias(buddy_alias)
      end
      msg = Event.new(sender, time, buddy_alias, str, event_type)
    end
  end
  return msg
end

#get_sender_by_alias(alias_name) ⇒ Object



338
339
340
341
342
343
344
345
346
347
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 338

def get_sender_by_alias(alias_name)
  no_action = alias_name.sub(/^\*{3}/, '')
  if @user_aliases.include? no_action.downcase.gsub(/\s+/, '')
    # Set the current alias being used of the ones in @user_aliases
    @user_alias = no_action
    return @user_SN
  else
    return @partner_SN
  end
end

#is_minimal_time?(str) ⇒ Boolean

Returns true if the time is minimal, i.e. doesn’t include a date. Otherwise returns false.

Returns:

  • (Boolean)


219
220
221
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 219

def is_minimal_time?(str)
  not str.strip.match(MINIMAL_TIME_REGEX).nil?
end

#parseObject

This method returns a LogFile instance, or false if an error occurred.



141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 141

def parse
  # Prevent parse from being called directly from BasicParser, since
  # it uses subclassing magic.
  if self.class == BasicParser
    oops("Please don't call parse directly from BasicParser. Use a subclass :)")
    return false
  end
  return false unless @log_file_is_valid
  @file_content = cleanup(@file_content).split("\n")

  @file_content.map! do |line|
    # "next" returns nil which is removed by compact
    next if line =~ /^\s+$/
    if line =~ @line_regex
      create_msg($~.captures)
    elsif line =~ @line_regex_status
      msg = create_status_or_event_msg($~.captures)
      # Error occurred while parsing
      return false if msg == false
    else
      error "Could not parse line:"
      p line
      return false
    end
  end
  @file_content.compact!
  return LogFile.new(@file_content, @service, @user_SN, @partner_SN, @adium_chat_time_start)
end

#pre_parse!Object

Extract required data from the file. Run by parse. Sets these variables:

  • @service

  • @user_SN

  • @partner_SN

  • @basic_time_info

  • @adium_chat_time_start

Returns true if none of these variables are false or nil.



257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 257

def pre_parse!
  # Deal with first line.

  # the first line is special. It tells us (in order of regex groups):
  # 1) who we're talking to
  # 2) what time/date
  # 3) what SN we used
  # 4) what protocol (AIM, icq, jabber...)
  first_line_match = /Conversation with (.+?) at (.+?) on (.+?) \((.+?)\)/.match(@first_line)
  if first_line_match.nil?
    raise InvalidFirstLineError
  else
    # first_line_match is like so:
    # ["Conversation with BUDDY_PERSON at 2006-12-21 22:36:06 on awesome SN (aim)",
    #  "BUDDY_PERSON",
    #  "2006-12-21 22:36:06",
    #  "awesome SN",
    #  "aim"]
    @service = first_line_match[4]
    # @user_SN is normalized to avoid "AIM.name" and "AIM.na me" folders
    @user_SN = first_line_match[3].downcase.tr(' ', '')
    @partner_SN = first_line_match[1]
    pidgin_chat_time_start = first_line_match[2]
    # @basic_time_info is for files that only have the full
    # timestamp at the top; we can use it to fill in the minimal
    # per-line timestamps. It is a hash with 3 keys:
    # * :year
    # * :mon
    # * :mday (day of month)
    # You should be able to fill everything else in. If you can't,
    # something's wrong.
    @basic_time_info = case pidgin_chat_time_start
                       when TIME_REGEX
                         {:year => $1.to_i,
                          :mon => $2.to_i,
                          :mday => $3.to_i}
                       when TIME_REGEX_FIRST_LINE
                         {:year => $3.to_i,
                          :mon => $1.to_i,
                          :mday => $2.to_i}
                       else
                         nil
                       end
    if @basic_time_info.nil?
      begin
        parsed_time = DateTime.parse(pidgin_chat_time_start)
        @basic_time_info = {:year => parsed_time.year,
                            :mon => parsed_time.mon,
                            :mday => parsed_time.mday}
      rescue ArgumentError
        # Couldn't parse the date
        Pidgin2Adium.oops("#{@src_path}: couldn't parse the date in the first line.")
        @basic_time_info = nil
      end
    end

    # Note: need @basic_time_info set for create_adium_time
    # When the chat started, in Adium's format
    @adium_chat_time_start = create_adium_time(pidgin_chat_time_start)

    first_line_variables = [@service,
                            @user_SN,
                            @partner_SN,
                            @basic_time_info,
                            @adium_chat_time_start]
    if first_line_variables.all?
      true
    else
      # Print an informative error message
      unset_variable_names = []
      unset_variable_names << 'service' if @service.nil?
      unset_variable_names << 'user_SN' if @user_SN.nil?
      unset_variable_names << 'partner_SN' if @partner_SN.nil?
      unset_variable_names << 'basic_time_info' if @basic_time_info.nil?
      unset_variable_names << 'adium_chat_time_start' if @adium_chat_time_start.nil?
      Pidgin2Adium.oops("Couldn't set these variables: #{unset_variable_names.join(', ')}")
      false
    end
  end
end

#strptime(time, format) ⇒ Object

Returns a Time object, or nil if the format string doesn’t match the time string.



172
173
174
175
176
177
178
179
180
181
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 172

def strptime(time, format)
  date_hash = Date._strptime(time, format)
  return nil if date_hash.nil?
  # Fill in any blanks using @basic_time_info
  date_hash = @basic_time_info.merge(date_hash)
  time = Time.local(date_hash[:year], date_hash[:mon], date_hash[:mday],
                    date_hash[:hour], date_hash[:min], date_hash[:sec],
                    date_hash[:sec_fraction], date_hash[:zone])
  time
end

#try_to_parse_minimal_time(minimal_time) ⇒ Object



208
209
210
211
212
213
214
215
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 208

def try_to_parse_minimal_time(minimal_time)
  formats = [
    "%I:%M:%S %P", # 04:01:45 AM
    "%H:%M:%S" # 23:01:45
  ]

  try_to_parse_time_with_formats(minimal_time, formats)
end

#try_to_parse_time(time) ⇒ Object



196
197
198
199
200
201
202
203
204
205
206
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 196

def try_to_parse_time(time)
  formats = [
    "%m/%d/%Y %I:%M:%S %P", # 01/22/2008 03:01:45 PM
    "%Y-%m-%d %H:%M:%S",    # 2008-01-22 23:08:24
    "%Y/%m/%d %H:%M:%S", # 2008/01/22 04:01:45
    "%Y-%m-%d %H:%M:%S",  # 2008-01-22 04:01:45
    '%a %d %b %Y %H:%M:%S %p %Z', # "Sat 18 Apr 2009 10:43:35 AM PDT"
    '%a %b %d %H:%M:%S %Y' # "Wed May 24 19:00:33 2006"
  ]
  try_to_parse_time_with_formats(time, formats)
end

#try_to_parse_time_with_formats(time, formats) ⇒ Object

Tries to parse time (a string) according to the formats in formats, which should be an array of strings. For more on acceptable format strings, see the official documentation for Time.strptime. Returns a Time object or nil (if no formats matched).



187
188
189
190
191
192
193
194
# File 'lib/pidgin2adium/parsers/basic_parser.rb', line 187

def try_to_parse_time_with_formats(time, formats)
  parsed = nil
  formats.each do |format|
    parsed = strptime(time, format)
    break unless parsed.nil?
  end
  parsed
end