Top Level Namespace

Defined Under Namespace

Modules: Ramparts Classes: EmailParser, PhoneParser, UrlParser

Constant Summary collapse

ARGUMENT_ERROR_TEXT =
'Parameter 1, the block of text to parse, is not a string'.freeze
MR_ALGO =

The map reduce (MR) algorithm. Faster by ~2x than the other algorithm. Maps parts of the text such as ‘at’ or ‘FOUR’ down to ‘@’ and ‘4’ removes spaces etc, and then runs a simple regex over the remainder Information loss occurs and hence it can’t return indices

'MR'.freeze
GR_ALGO =

The glorified regex (GR) algorithm. An obtuse and yet heartily strong regex that does a single pass over the text. Since the regex is so complicated and robust - it is slower than the map reduce algorithm. No information loss occurs so we can return indices of where the phone numbers and etc. exist

'GR'.freeze
EMAIL_DOMAINS =
%w[
  gmail
  yahoo
  hotmail
  aol
  icloud
  live
  outlook
  ymail
  comcast
  shaw
  rogers
  msn
  mail
  me
  att
  careguide
  sbcglobal
  rocketmail
  telus
  sympatico
  cox
  gmai
  email
  aim
  yandex
  gamil
  gmx
  student
  students
  earthlink
  gnail
  juno
  gmsil
  netzero
  ail
  gmil
  gmal
  hmail
  yaho
  alumni
  gmial
  googlemail
  tampabay
  mtroyal
  usa
  cfl
  yshoo
  protonmail
  rediffmail
  liberty
  maine
  inbox
  optimum
  example
  yhaoo
  yorku
  mchsi
  yahoi
  zoho
  hushmail
  libero
  hotmal
  ukr
  wowway
  post
  lycos
  yaboo
  contractor
  yahool
].freeze

Instance Method Summary collapse

Instance Method Details

#ranges_overlap?(r1, r2) ⇒ Boolean

Check if two ranges overlap

Returns:

  • (Boolean)


44
45
46
# File 'lib/ramparts/helpers.rb', line 44

def ranges_overlap?(r1, r2)
  r1.cover?(r2.first) || r2.cover?(r1.first)
end

#replace(text, instances, &block) ⇒ Object

Given some text it replaces each matched instance with the given insertable



19
20
21
22
23
24
25
26
27
# File 'lib/ramparts/helpers.rb', line 19

def replace(text, instances, &block) # rubocop:disable Lint/UnusedMethodArgument
  altered_text = String.new(text)

  instances.map do |instance|
    insertable = yield instance
    altered_text[instance[:start_offset]...instance[:end_offset]] = insertable
  end
  altered_text
end

#scan(text, regex, type) ⇒ Object

Given some text it scans the text with the given regex for matches



30
31
32
33
34
35
36
37
38
39
40
41
# File 'lib/ramparts/helpers.rb', line 30

def scan(text, regex, type)
  text
    .enum_for(:scan, regex)
    .map do
      {
        start_offset: Regexp.last_match.begin(0),
        end_offset: Regexp.last_match.begin(0) + Regexp.last_match.to_s.length,
        value: Regexp.last_match.to_s,
        type: type
      }
    end
end