Module: TextRank

Defined in:
lib/text_rank.rb,
lib/text_rank/version.rb,
lib/text_rank/tokenizer.rb,
lib/text_rank/char_filter.rb,
lib/text_rank/fingerprint.rb,
lib/text_rank/rank_filter.rb,
lib/text_rank/token_filter.rb,
lib/text_rank/tokenizer/url.rb,
lib/text_rank/graph_strategy.rb,
lib/text_rank/tokenizer/word.rb,
lib/text_rank/tokenizer/money.rb,
lib/text_rank/tokenizer/number.rb,
lib/text_rank/keyword_extractor.rb,
lib/text_rank/fingerprint_overlap.rb,
lib/text_rank/tokenizer/whitespace.rb,
lib/text_rank/char_filter/lowercase.rb,
lib/text_rank/tokenizer/punctuation.rb,
lib/text_rank/char_filter/strip_html.rb,
lib/text_rank/token_filter/stopwords.rb,
lib/text_rank/char_filter/strip_email.rb,
lib/text_rank/token_filter/min_length.rb,
lib/text_rank/char_filter/ascii_folding.rb,
lib/text_rank/rank_filter/sort_by_value.rb,
lib/text_rank/graph_strategy/coocurrence.rb,
lib/text_rank/token_filter/part_of_speech.rb,
lib/text_rank/char_filter/strip_possessive.rb,
lib/text_rank/char_filter/undo_contractions.rb,
lib/text_rank/rank_filter/collapse_adjacent.rb,
lib/text_rank/rank_filter/normalize_probability.rb,
lib/text_rank/rank_filter/normalize_unit_vector.rb

Overview

Provides convenience methods for quickly extracting keywords.

See Also:

  • README

Defined Under Namespace

Modules: CharFilter, GraphStrategy, RankFilter, TokenFilter, Tokenizer Classes: Fingerprint, FingerprintOverlap, KeywordExtractor

Constant Summary collapse

VERSION =

Current gem version

'1.3.1'

Class Method Summary collapse

Class Method Details

.extract_keywords(text, **options) ⇒ Hash<String, Float>

A convenience method for quickly extracting keywords from text with default options

Parameters:

  • text (String, Array<String>)

    text from which to extract keywords

Options Hash (**options):

  • :char_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied prior to tokenization

  • :tokenizers (Array<Symbol, Regexp, String>)

    A list of tokenizer regular expressions to perform tokenization

  • :token_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied to each token after tokenization

  • :graph_strategy (Class, Symbol, #build_graph)

    A class or strategy instance for producing a graph from tokens

  • :rank_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied to the keyword ranks after keyword extraction

  • :strategy (Symbol)

    PageRank strategy to use (either :sparse or :dense)

  • :damping (Float)

    The probability of following the graph vs. randomly choosing a new node

  • :tolerance (Float)

    The desired accuracy of the results

Returns:

  • (Hash<String, Float>)

    of tokens and text rank (in descending order)



26
27
28
# File 'lib/text_rank.rb', line 26

def self.extract_keywords(text, **options)
  TextRank::KeywordExtractor.basic(**options).extract(text, **options)
end

.extract_keywords_advanced(text, **options) ⇒ Hash<String, Float>

A convenience method for quickly extracting keywords from text with default advanced options

Parameters:

  • text (String, Array<String>)

    text from which to extract keywords

Options Hash (**options):

  • :char_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied prior to tokenization

  • :tokenizers (Array<Symbol, Regexp, String>)

    A list of tokenizer regular expressions to perform tokenization

  • :token_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied to each token after tokenization

  • :graph_strategy (Class, Symbol, #build_graph)

    A class or strategy instance for producing a graph from tokens

  • :rank_filters (Array<Class, Symbol, #filter!>)

    A list of filters to be applied to the keyword ranks after keyword extraction

  • :strategy (Symbol)

    PageRank strategy to use (either :sparse or :dense)

  • :damping (Float)

    The probability of following the graph vs. randomly choosing a new node

  • :tolerance (Float)

    The desired accuracy of the results

Returns:

  • (Hash<String, Float>)

    of tokens and text rank (in descending order)



34
35
36
# File 'lib/text_rank.rb', line 34

def self.extract_keywords_advanced(text, **options)
  TextRank::KeywordExtractor.advanced(**options).extract(text, **options)
end

.similarity(keywords1, keywords2) ⇒ Object



38
39
40
# File 'lib/text_rank.rb', line 38

def self.similarity(keywords1, keywords2)
  TextRank::Fingerprint.new(*keywords1).similarity(TextRank::Fingerprint.new(*keywords2))
end