Module: DeBiLinguifier

Extended by:
DeBiLinguifier
Included in:
DeBiLinguifier
Defined in:
lib/debilinguifier.rb

Overview

This only works with capital AND detoned characters of latin and greek charsets

Constant Summary collapse

SYMBOLS =

The symbols

'\s\.\,\@\d\-\(\)\:\/\&\''.freeze
GREEK_LOOKING_CHARS =

A regular expression to check if the input phrase’s characters all belong in the greek charset

Regexp.new("(^[Α-ΩABEHIKMNOPTXYZ#{SYMBOLS}]+)+$").freeze
LATIN_LOOKING_CHARS =

A regular expression to check if the input phrase’s characters all belong in the latin charset

Regexp.new("(^[A-ZΑΒΕΗΙΚΜΝΟΡΤΥΧΖ#{SYMBOLS}]+)+$").freeze
LATIN_ALPHABET_PLUS_SYMBOLS =

A regular expression to match strings already written only with latin charset

Regexp.new("(^[A-Z#{SYMBOLS}]+)+$").freeze
GREEK_ALPHABET_PLUS_SYMBOLS =

A regular expression to match strings already written only with latin charset

Regexp.new("(^[Α-Ω#{SYMBOLS}]+)+$").freeze

Instance Method Summary collapse

Instance Method Details

#can_write_only_greek?(input) ⇒ Boolean

Determine if the whole phrase can be written only with greek charset

Returns:

  • (Boolean)


55
56
57
# File 'lib/debilinguifier.rb', line 55

def can_write_only_greek?(input)
  !!(input.match(GREEK_LOOKING_CHARS))
end

#can_write_only_latin?(input) ⇒ Boolean

Determine if the whole phrase can be written only with latin charset

Returns:

  • (Boolean)


60
61
62
# File 'lib/debilinguifier.rb', line 60

def can_write_only_latin?(input)
  !!(input.match(LATIN_LOOKING_CHARS))
end

#dbl(input) ⇒ String

Only works with latin and greek charsets. An input phrase can only be one of five things: 1) Already only in greek or only in latin charset. 2) Written in a mixed charset, but can be written with just the greek charset. 3) Written in a mixed charset, but can be written with just the latin charset. 4) Written in a mixed charset, but cannot be written with only one of the [greek, latin] charsets.

In this case we split the phrase into words and apply the above rules to each word seperately.
If case 4 applies to a single word, there is nothing more we can do for it than return it "as is".

5) Written in a mixed charset, but can be written either with just the greek charset or just the latin charset.

Note: We are deliberately ignoring case 5, as it is of no use at the moment as a separate case. It is actually the initersection of cases 2 and 3. Using case 2 instead.

Returns:

  • (String)

    the de-bi-linguized string



32
33
34
35
36
37
38
39
40
41
42
# File 'lib/debilinguifier.rb', line 32

def dbl(input)
  if(is_greek_only?(input) || is_latin_only?(input)) # Case 1
    input
  elsif(can_write_only_greek?(input))                # Case 2
    return_in_greek(input)
  elsif(can_write_only_latin?(input))                # Case 3
    return_in_latin(input)
  else                                               # Case 4
    return_in_mixed_charset(input)
  end
end

#is_greek_only?(input) ⇒ Boolean

Determine if the input phrase is already only in greek charset

Returns:

  • (Boolean)


45
46
47
# File 'lib/debilinguifier.rb', line 45

def is_greek_only?(input)
  !!(input.match(GREEK_ALPHABET_PLUS_SYMBOLS))
end

#is_latin_only?(input) ⇒ Boolean

Determine if the input phrase is already only in latin charset

Returns:

  • (Boolean)


50
51
52
# File 'lib/debilinguifier.rb', line 50

def is_latin_only?(input)
  !!(input.match(LATIN_ALPHABET_PLUS_SYMBOLS))
end

#return_in_greek(input) ⇒ Object

Return the phrase using the greek characters only



65
66
67
# File 'lib/debilinguifier.rb', line 65

def return_in_greek(input)
  input.tr('abehikmnoptxyz'.upcase, 'αβεηικμνορτχυζ'.upcase)
end

#return_in_latin(input) ⇒ Object

Return the phrase using the latin characters only



70
71
72
# File 'lib/debilinguifier.rb', line 70

def return_in_latin(input)
  input.tr('αβεηικμνορτχυζ'.upcase, 'abehikmnoptxyz'.upcase) 
end

#return_in_mixed_charset(input) ⇒ Object

Return the phrase using both charsets



75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
# File 'lib/debilinguifier.rb', line 75

def return_in_mixed_charset(input)
  # Split the phrase in words and recursively try to return each word in the "correct" charset
  # If that is not possible (e.g. a word contains both "Φ" and "C", return it as it was originally
  # We first split the input phrase, based on the SYMBOLS delimiters 
  words_arr = input.split(/(?<=[#{SYMBOLS}])/)
  if words_arr.length == 1            # If it was only one word, return it.
    return (words_arr.join.to_s)
  else                                # Else apply dbl to each word we got after splitting input
    words_arr2 =[]
    words_arr.each do |word|
      words_arr2 << dbl(word)
    end
    return words_arr2.join.to_s
  end
end