Module: DeBiLinguifier
Overview
This only works with capital AND detoned characters of latin and greek charsets
Constant Summary collapse
- SYMBOLS =
The symbols
'\s\.\,\@\d\-\(\)\:\/\&\''.freeze
- GREEK_LOOKING_CHARS =
A regular expression to check if the input phrase’s characters all belong in the greek charset
Regexp.new("(^[Α-ΩABEHIKMNOPTXYZ#{SYMBOLS}]+)+$").freeze
- LATIN_LOOKING_CHARS =
A regular expression to check if the input phrase’s characters all belong in the latin charset
Regexp.new("(^[A-ZΑΒΕΗΙΚΜΝΟΡΤΥΧΖ#{SYMBOLS}]+)+$").freeze
- LATIN_ALPHABET_PLUS_SYMBOLS =
A regular expression to match strings already written only with latin charset
Regexp.new("(^[A-Z#{SYMBOLS}]+)+$").freeze
- GREEK_ALPHABET_PLUS_SYMBOLS =
A regular expression to match strings already written only with latin charset
Regexp.new("(^[Α-Ω#{SYMBOLS}]+)+$").freeze
Instance Method Summary collapse
-
#can_write_only_greek?(input) ⇒ Boolean
Determine if the whole phrase can be written only with greek charset.
-
#can_write_only_latin?(input) ⇒ Boolean
Determine if the whole phrase can be written only with latin charset.
-
#dbl(input) ⇒ String
Only works with latin and greek charsets.
-
#is_greek_only?(input) ⇒ Boolean
Determine if the input phrase is already only in greek charset.
-
#is_latin_only?(input) ⇒ Boolean
Determine if the input phrase is already only in latin charset.
-
#return_in_greek(input) ⇒ Object
Return the phrase using the greek characters only.
-
#return_in_latin(input) ⇒ Object
Return the phrase using the latin characters only.
-
#return_in_mixed_charset(input) ⇒ Object
Return the phrase using both charsets.
Instance Method Details
#can_write_only_greek?(input) ⇒ Boolean
Determine if the whole phrase can be written only with greek charset
55 56 57 |
# File 'lib/debilinguifier.rb', line 55 def can_write_only_greek?(input) !!(input.match(GREEK_LOOKING_CHARS)) end |
#can_write_only_latin?(input) ⇒ Boolean
Determine if the whole phrase can be written only with latin charset
60 61 62 |
# File 'lib/debilinguifier.rb', line 60 def can_write_only_latin?(input) !!(input.match(LATIN_LOOKING_CHARS)) end |
#dbl(input) ⇒ String
Only works with latin and greek charsets. An input phrase can only be one of five things: 1) Already only in greek or only in latin charset. 2) Written in a mixed charset, but can be written with just the greek charset. 3) Written in a mixed charset, but can be written with just the latin charset. 4) Written in a mixed charset, but cannot be written with only one of the [greek, latin] charsets.
In this case we split the phrase into words and apply the above rules to each word seperately.
If case 4 applies to a single word, there is nothing more we can do for it than return it "as is".
5) Written in a mixed charset, but can be written either with just the greek charset or just the latin charset.
Note: We are deliberately ignoring case 5, as it is of no use at the moment as a separate case. It is actually the initersection of cases 2 and 3. Using case 2 instead.
32 33 34 35 36 37 38 39 40 41 42 |
# File 'lib/debilinguifier.rb', line 32 def dbl(input) if(is_greek_only?(input) || is_latin_only?(input)) # Case 1 input elsif(can_write_only_greek?(input)) # Case 2 return_in_greek(input) elsif(can_write_only_latin?(input)) # Case 3 return_in_latin(input) else # Case 4 return_in_mixed_charset(input) end end |
#is_greek_only?(input) ⇒ Boolean
Determine if the input phrase is already only in greek charset
45 46 47 |
# File 'lib/debilinguifier.rb', line 45 def is_greek_only?(input) !!(input.match(GREEK_ALPHABET_PLUS_SYMBOLS)) end |
#is_latin_only?(input) ⇒ Boolean
Determine if the input phrase is already only in latin charset
50 51 52 |
# File 'lib/debilinguifier.rb', line 50 def is_latin_only?(input) !!(input.match(LATIN_ALPHABET_PLUS_SYMBOLS)) end |
#return_in_greek(input) ⇒ Object
Return the phrase using the greek characters only
65 66 67 |
# File 'lib/debilinguifier.rb', line 65 def return_in_greek(input) input.tr('abehikmnoptxyz'.upcase, 'αβεηικμνορτχυζ'.upcase) end |
#return_in_latin(input) ⇒ Object
Return the phrase using the latin characters only
70 71 72 |
# File 'lib/debilinguifier.rb', line 70 def return_in_latin(input) input.tr('αβεηικμνορτχυζ'.upcase, 'abehikmnoptxyz'.upcase) end |
#return_in_mixed_charset(input) ⇒ Object
Return the phrase using both charsets
75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 |
# File 'lib/debilinguifier.rb', line 75 def return_in_mixed_charset(input) # Split the phrase in words and recursively try to return each word in the "correct" charset # If that is not possible (e.g. a word contains both "Φ" and "C", return it as it was originally # We first split the input phrase, based on the SYMBOLS delimiters words_arr = input.split(/(?<=[#{SYMBOLS}])/) if words_arr.length == 1 # If it was only one word, return it. return (words_arr.join.to_s) else # Else apply dbl to each word we got after splitting input words_arr2 =[] words_arr.each do |word| words_arr2 << dbl(word) end return words_arr2.join.to_s end end |