Module: Chinese::HelperMethods

Included in:: Scraper, Vocab

Defined in:: lib/chinese_vocab/modules/helper_methods.rb

Class Method Summary collapse

.included(klass) ⇒ Object

Instance Method Summary collapse

#distinct_words(word) ⇒ Object

Input: “除了。。。以外。。。” Outout: [“除了”, “以外”].
#include_every_char?(word, sentence) ⇒ Boolean

Return true if every distinct word as defined by #distinct_words can be found in the given sentence.
#is_unicode?(word) ⇒ Boolean

Class Method Details

.included(klass) ⇒ `Object`



6
7
8

# File 'lib/chinese_vocab/modules/helper_methods.rb', line 6

def self.included(klass)
  klass.extend(self)
end

Instance Method Details

#distinct_words(word) ⇒ `Object`

Input: “除了。。。以外。。。” Outout: [“除了”, “以外”]

# File 'lib/chinese_vocab/modules/helper_methods.rb', line 22

def distinct_words(word)
  # http://stackoverflow.com/a/3976004
  # Alternative: /[[:word:]]+/
  word.scan(/\p{Word}+/)      # Returns an array of characters that belong together.
end

#include_every_char?(word, sentence) ⇒ `Boolean`

Return true if every distinct word as defined by #distinct_words can be found in the given sentence.

Returns:

(Boolean)

# File 'lib/chinese_vocab/modules/helper_methods.rb', line 30

def include_every_char?(word, sentence)
  characters = distinct_words(word)
  characters.all? {|char| sentence.include?(char) }
end

#is_unicode?(word) ⇒ `Boolean`

Returns:

(Boolean)

# File 'lib/chinese_vocab/modules/helper_methods.rb', line 10

def is_unicode?(word)
  # Remove all non-ascii and non-unicode word characters
  word = distinct_words(word).join
  # English text at this point only contains characters that are mathed by \w
  # Chinese text at this point contains mostly/only unicode word characters that are not matched by \w.
  # In case of Chinese text the size of 'char_arr' therefore has to be smaller than the size of 'word'
  char_arr = word.scan(/\w/)
  char_arr.size < word.size
end

Module: Chinese::HelperMethods

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.included(klass) ⇒ Object

Instance Method Details

#distinct_words(word) ⇒ Object

#include_every_char?(word, sentence) ⇒ Boolean

#is_unicode?(word) ⇒ Boolean

.included(klass) ⇒ `Object`

#distinct_words(word) ⇒ `Object`

#include_every_char?(word, sentence) ⇒ `Boolean`

#is_unicode?(word) ⇒ `Boolean`