Class: PragmaticSegmenter::List

Inherits:
Object
  • Object
show all
Defined in:
lib/pragmatic_segmenter/list.rb

Overview

This class searches for a list within a string and adds newlines before each list item.

Constant Summary collapse

ROMAN_NUMERALS =
%w(i ii iii iv v vi vii viii ix x xi xii xiii xiv x xi xii xiii xv xvi xvii xviii xix xx)
LATIN_NUMERALS =
('a'..'z').to_a
ALPHABETICAL_LIST_WITH_PERIODS =
/(?<=^)[a-z](?=\.)|(?<=\A)[a-z](?=\.)|(?<=\s)[a-z](?=\.)/
ALPHABETICAL_LIST_WITH_PARENS =
/(?<=\()[a-z]+(?=\))|(?<=^)[a-z]+(?=\))|(?<=\A)[a-z]+(?=\))|(?<=\s)[a-z]+(?=\))/i
SubstituteListPeriodRule =
Rule.new(//, '')
ListMarkerRule =
Rule.new(//, '')
SpaceBetweenListItemsFirstRule =
Rule.new(/(?<=\S\S|^)\s(?=\S\s*\d{1,2}♨)/, "\r")
SpaceBetweenListItemsSecondRule =
Rule.new(/(?<=\S\S|^)\s(?=\d{1,2}♨)/, "\r")
SpaceBetweenListItemsThirdRule =
Rule.new(/(?<=\S\S|^)\s(?=\d{1,2}☝)/, "\r")
NUMBERED_LIST_REGEX_1 =
/\s\d{1,2}(?=\.\s)|^\d{1,2}(?=\.\s)|\s\d{1,2}(?=\.\))|^\d{1,2}(?=\.\))|(?<=\s\-)\d{1,2}(?=\.\s)|(?<=^\-)\d{1,2}(?=\.\s)|(?<=\s\⁃)\d{1,2}(?=\.\s)|(?<=^\⁃)\d{1,2}(?=\.\s)|(?<=s\-)\d{1,2}(?=\.\))|(?<=^\-)\d{1,2}(?=\.\))|(?<=\s\⁃)\d{1,2}(?=\.\))|(?<=^\⁃)\d{1,2}(?=\.\))/
NUMBERED_LIST_REGEX_2 =
/(?<=\s)\d{1,2}\.(?=\s)|^\d{1,2}\.(?=\s)|(?<=\s)\d{1,2}\.(?=\))|^\d{1,2}\.(?=\))|(?<=\s\-)\d{1,2}\.(?=\s)|(?<=^\-)\d{1,2}\.(?=\s)|(?<=\s\⁃)\d{1,2}\.(?=\s)|(?<=^\⁃)\d{1,2}\.(?=\s)|(?<=\s\-)\d{1,2}\.(?=\))|(?<=^\-)\d{1,2}\.(?=\))|(?<=\s\⁃)\d{1,2}\.(?=\))|(?<=^\⁃)\d{1,2}\.(?=\))/
NUMBERED_LIST_PARENS_REGEX =
/\d{1,2}(?=\)\s)/
EXTRACT_ALPHABETICAL_LIST_LETTERS_REGEX =
/\([a-z]+(?=\))|(?<=^)[a-z]+(?=\))|(?<=\A)[a-z]+(?=\))|(?<=\s)[a-z]+(?=\))/i
ALPHABETICAL_LIST_LETTERS_AND_PERIODS_REGEX =
/(?<=^)[a-z]\.|(?<=\A)[a-z]\.|(?<=\s)[a-z]\./i
ROMAN_NUMERALS_IN_PARENTHESES =
/\(((?=[mdclxvi])m*(c[md]|d?c*)(x[cl]|l?x*)(i[xv]|v?i*))\)(?=\s[A-Z])/

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(text:) ⇒ List

Returns a new instance of List.



50
51
52
# File 'lib/pragmatic_segmenter/list.rb', line 50

def initialize(text:)
  @text = text.dup
end

Instance Attribute Details

#textObject (readonly)

Returns the value of attribute text.



49
50
51
# File 'lib/pragmatic_segmenter/list.rb', line 49

def text
  @text
end

Instance Method Details

#add_line_breakObject



54
55
56
57
58
59
# File 'lib/pragmatic_segmenter/list.rb', line 54

def add_line_break
  format_alphabetical_lists
  format_roman_numeral_lists
  format_numbered_list_with_periods
  format_numbered_list_with_parens
end

#replace_parensObject



61
62
63
64
# File 'lib/pragmatic_segmenter/list.rb', line 61

def replace_parens
  text.gsub!(ROMAN_NUMERALS_IN_PARENTHESES, '&✂&\1&⌬&'.freeze)
  text
end