Module: TextRank

Defined in:: lib/text_rank.rb,
lib/text_rank/version.rb,
lib/text_rank/tokenizer.rb,
lib/text_rank/char_filter.rb,
lib/text_rank/fingerprint.rb,
lib/text_rank/rank_filter.rb,
lib/text_rank/token_filter.rb,
lib/text_rank/tokenizer/url.rb,
lib/text_rank/graph_strategy.rb,
lib/text_rank/tokenizer/word.rb,
lib/text_rank/tokenizer/money.rb,
lib/text_rank/tokenizer/number.rb,
lib/text_rank/keyword_extractor.rb,
lib/text_rank/fingerprint_overlap.rb,
lib/text_rank/tokenizer/whitespace.rb,
lib/text_rank/char_filter/lowercase.rb,
lib/text_rank/tokenizer/punctuation.rb,
lib/text_rank/char_filter/strip_html.rb,
lib/text_rank/token_filter/stopwords.rb,
lib/text_rank/char_filter/strip_email.rb,
lib/text_rank/token_filter/min_length.rb,
lib/text_rank/char_filter/ascii_folding.rb,
lib/text_rank/rank_filter/sort_by_value.rb,
lib/text_rank/graph_strategy/coocurrence.rb,
lib/text_rank/token_filter/part_of_speech.rb,
lib/text_rank/char_filter/strip_possessive.rb,
lib/text_rank/char_filter/undo_contractions.rb,
lib/text_rank/rank_filter/collapse_adjacent.rb,
lib/text_rank/rank_filter/normalize_probability.rb,
lib/text_rank/rank_filter/normalize_unit_vector.rb

Overview

Provides convenience methods for quickly extracting keywords.

Defined Under Namespace

Modules: CharFilter, GraphStrategy, RankFilter, TokenFilter, Tokenizer Classes: Fingerprint, FingerprintOverlap, KeywordExtractor

Constant Summary collapse

VERSION = Current gem version

'1.3.1'

Class Method Summary collapse

.extract_keywords(text, **options) ⇒ Hash<String, Float>
A convenience method for quickly extracting keywords from text with default options.
.extract_keywords_advanced(text, **options) ⇒ Hash<String, Float>
A convenience method for quickly extracting keywords from text with default advanced options.
.similarity(keywords1, keywords2) ⇒ Object

Class Method Details

.extract_keywords(text, **options) ⇒ `Hash<String, Float>`

A convenience method for quickly extracting keywords from text with default options

Parameters:

text (String, Array<String>) —
text from which to extract keywords

Options Hash (**options):

:char_filters (Array<Class, Symbol, #filter!>) —
A list of filters to be applied prior to tokenization
:tokenizers (Array<Symbol, Regexp, String>) —
A list of tokenizer regular expressions to perform tokenization
:token_filters (Array<Class, Symbol, #filter!>) —
A list of filters to be applied to each token after tokenization
:graph_strategy (Class, Symbol, #build_graph) —
A class or strategy instance for producing a graph from tokens
:rank_filters (Array<Class, Symbol, #filter!>) —
A list of filters to be applied to the keyword ranks after keyword extraction
:strategy (Symbol) —
PageRank strategy to use (either :sparse or :dense)
:damping (Float) —
The probability of following the graph vs. randomly choosing a new node
:tolerance (Float) —
The desired accuracy of the results

Returns:

(Hash<String, Float>) —
of tokens and text rank (in descending order)



26
27
28

# File 'lib/text_rank.rb', line 26

def self.extract_keywords(text, **options)
  TextRank::KeywordExtractor.basic(**options).extract(text, **options)
end

.extract_keywords_advanced(text, **options) ⇒ `Hash<String, Float>`

A convenience method for quickly extracting keywords from text with default advanced options