Class: ExtendedEmailReplyParser::Parsers::Base

Inherits:
Object
  • Object
show all
Defined in:
lib/extended_email_reply_parser/parsers/base.rb

Direct Known Subclasses

Github, HtmlMails, I18nDe, I18nEn

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text_before_parsing) ⇒ Base

Returns a new instance of Base.



7
8
9
10
11
12
13
14
15
16
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 7

def initialize(text_before_parsing)
  self.text = text_before_parsing

  # The `EmailReplyParser::Email` is extended in this gem.
  # Have a look at:
  #
  #   lib/extended_email_reply_parser/email_reply_parser/email.rb
  #
  @email = EmailReplyParser::Email.new.read(text)
end

Instance Attribute Details

#textObject

Returns the value of attribute text.



5
6
7
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 5

def text
  @text
end

Class Method Details

.add_quote_header_regex(regex_string) ⇒ Object

The github parser (github.com/github/email_reply_parser) needs to know how to identify the header line of quotes, for example

"On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote"

Example email:

Hi,

On Tue, 2011-03-01 at 18:02 +0530, Abhishek Kona wrote:
> Hi folks
>
> What is the best way to clear a Riak bucket of all key, values after
> running a test?
> I am currently using the Java HTTP API.

You can list the keys for the bucket and call delete for each. Or if you
put the keys (and kept track of them in your test) you can delete them
one at a time (without incurring the cost of calling list first.)

By default, the github parser uses the regex ‘/^On .* wrote:$/` for that. To make it recognize other header lines, specify their patterns using `add_quote_header_regex`.

Since this is needed by the github parser, i.e. possibly before the ‘parse` method of your custom parser is run, make sure to add the quote header regex in the class head:

module ExtendedEmailReplyParser
  class Parsers::I18nDe < Parsers::Base
    add_quote_header_regex '^Am .* schrieb.*$'
    # ...
  end
end


167
168
169
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 167

def self.add_quote_header_regex(regex_string)
  @@quote_header_regexes << regex_string
end

.quote_header_regexesObject

“On … wrote:” (English) “Am … schrieb …:” (German) …



128
129
130
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 128

def self.quote_header_regexes
  @@quote_header_regexes
end

.subclassesObject



171
172
173
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 171

def self.subclasses
  ObjectSpace.each_object(Class).select { |klass| klass < self }
end

Instance Method Details

#except_in_visible_block_quotes(&block) ⇒ Object

To avoid cutting off the email within a visible quote, wrap the ‘hide_everything_after` calls within a `except_in_visible_block_quotes` block:

module ExtendedEmailReplyParser
  class Parsers::I18nEn < Parsers::Base
    def parse
      except_in_visible_block_quotes do
        hide_everything_after ["From: ", "Sent: ", "To: "]
      end
      # ...
    end
  end
end

Otherwise, the following email would be completely cut off after “Hi Chris,”.

Hi Chris,

> From: Chris <[email protected]>
> Sent: Saturday, July 09, 2016 3:27 PM
> To: John <[email protected]>
> Subject: The solution!
>
> Hi John,
> I've just found a solution to our big problem!

this is great, thanks!
Cheers, John


59
60
61
62
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 59

def except_in_visible_block_quotes(&block)
  @email.except_in_visible_block_quotes(&block)
  return @email.visible_text
end

#hide_everything_after(expressions) ⇒ Object

Some email clients do not quote the previous conversation.

Hi Chris,
this is great, thanks!
Cheers, John

From: Chris <[email protected]>
Sent: Saturday, July 09, 2016 3:27 PM
To: John <[email protected]>
Subject: The solution!

Hi John,
I've just found a solution to our big problem!
...

To remove the previous conversation, tell the parser expressions to identify where start of the previous conversation:

module ExtendedEmailReplyParser
  class Parsers::I18nEn < Parsers::Base
    def parse
      except_in_visible_block_quotes do
        hide_everything_after ["From: ", "Sent: ", "To: "]
      end
      # ...
    end
  end
end

The parser will combine the expressions to a regex:

/(#{expressions.join(".*?")}.*?\n)/m`

for example:

/(From: .*?Sent: .*?To: .*?\n)/m


119
120
121
122
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 119

def hide_everything_after(expressions)
  @email.hide_everything_after(expressions)
  return @email.visible_text
end

#parseObject

This ‘parse` method of the `Parsers::Base` will be overridden by the individual parsers.

The text before parsing is accessed with ‘text`. The method `parse` is expected to return the parsed text.



24
25
26
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 24

def parse
  return text
end

#remove_empty_lines_between_block_quote_linesObject

Boil quote like these

> Hi,

> how are you doing?

> Cheers

down to

> Hi,
> how are you doing?
> Cheers


79
80
81
82
# File 'lib/extended_email_reply_parser/parsers/base.rb', line 79

def remove_empty_lines_between_block_quote_lines
  @email.remove_empty_lines_between_block_quote_lines
  return @email.visible_text
end