Class: Corenlp::Punctuation
Constant Summary collapse
- CURRENCY_SYMBOLS =
%W(\u0024 \u00A2 \u00A3 \u00A4 \u00A5 \u20AC)
- DASH_SYMBOLS =
%W(\u2010 \u2011 \u2012 \u2013 \u2014)
- OPENING_SYMBOLS =
%W(\u201C \u2018 \u00A1 \u00BF \( [ {)
Constants inherited from Token
Token::Enclitics, Token::IGNORED_ENTITIES, Token::NumberRegexp, Token::PunctRegexp, Token::STANFORD_TEXT_REPLACEMENTS, Token::WebsiteRegexp, Token::WordRegexp
Instance Attribute Summary
Attributes inherited from Token
#index, #ner, #penn_treebank_tag, #stanford_lemma, #text, #type
Instance Method Summary collapse
Methods inherited from Token
#==, clean_stanford_text, #content?, #ignored_entity?, #initialize, token_subclass_from_text, #top_level_penn_treebank_category, #website_text?
Constructor Details
This class inherits a constructor from Corenlp::Token
Instance Method Details
#currency? ⇒ Boolean
8 9 10 |
# File 'lib/corenlp/punctuation.rb', line 8 def currency? CURRENCY_SYMBOLS.include?(text) end |
#dash? ⇒ Boolean
12 13 14 |
# File 'lib/corenlp/punctuation.rb', line 12 def dash? DASH_SYMBOLS.include?(text) end |
#opening? ⇒ Boolean
16 17 18 |
# File 'lib/corenlp/punctuation.rb', line 16 def opening? OPENING_SYMBOLS.include?(text) end |