Class: TextRank::RankFilter::CollapseAdjacent

Inherits:
Object
  • Object
show all
Defined in:
lib/text_rank/rank_filter/collapse_adjacent.rb

Overview

A rank filter which attempts to collapse one of the highly ranked, single token keywords into a combined keyword when those keywords are adjacent to each other in the original text.

It tries to do this in as intelligent a manner as possible, keeping the single tokens that comprise a combination when one or more of the single tokens occur more often than the combination.

This filter operates on the original (non-filtered) text in order to more intelligently determine true text adjacency versus token adjacency (e.g. two tokens can be adjacent even though they appeared in the original text on separate lines with punctuation in between. However, because it operates on the original text we may fail to find some combinations due to the keyword tokens not exactly matching the original text any more (e.g. if ASCII folding has occurred). The goal is to err on the side of caution: it is better to not suggest a combination than to suggest a bad combination.

= Example

CollapseAdjacent.new(ranks_to_collapse: 6, max_tokens_to_combine: 2).filter!( { "town" => 0.9818754334834477, "cities" => 0.9055017128817066, "siege" => 0.7411519524982207, "arts" => 0.6907977453782612, "envy" => 0.6692709808107252, "blessings" => 0.6442147897516214, "plagues" => 0.5972420789430091, "florish" => 0.3746092797528525, "devoured" => 0.36867321734332237, "anxieties" => 0.3367731719604189, "peace" => 0.2905352582752693, "inhabitants" => 0.12715120116732137, "cares" => 0.0697383057947685, }, original_text: "cities blessings peace arts florish inhabitants devoured envy cares anxieties plagues town siege" ) => { "town siege" => 0.9818754334834477, "cities blessings" => 0.9055017128817066, "arts florish" => 0.6907977453782612, "devoured envy" => 0.6692709808107252, "anxieties plagues" => 0.5972420789430091, "peace" => 0.2905352582752693, "inhabitants" => 0.12715120116732137, "cares" => 0.0697383057947685, "town siege" => 0.2365184450186848, "cities blessings" => 0.21272821337880285, "arts florish" => 0.146247479840506, "devoured envy" => 0.1424776818760168, "anxieties plagues" => 0.12821144722639122, "peace" => 0.07976303576999531, "inhabitants" => 0.03490786580297893, "cares" => 0.019145831086624026, }

Instance Method Summary collapse

Constructor Details

#initialize(**options) ⇒ CollapseAdjacent

Returns a new instance of CollapseAdjacent.

Parameters:

  • options (Hash)

    a customizable set of options

Options Hash (**options):

  • ranks_to_collapse (Fixnum)

    the top N ranks in which to look for collapsable keywords

  • max_tokens_to_combine (Fixnum)

    the maximum number of tokens to collapse into a combined keyword

  • ignore_case (true, false)

    whether to ignore case when finding adjacent keywords in original text

  • delimiter (String)

    an optional delimiter between adjacent keywords in original text



66
67
68
# File 'lib/text_rank/rank_filter/collapse_adjacent.rb', line 66

def initialize(**options)
  @options = options
end

Instance Method Details

#filter!(ranks, original_text:, **_) ⇒ Hash<String, Float>

Perform the filter on the ranks

Parameters:

  • ranks (Hash<String, Float>)

    the results of the PageRank algorithm

  • original_text (String)

    the original text (pre-tokenization) from which to find collapsable keywords

Returns:

  • (Hash<String, Float>)


74
75
76
# File 'lib/text_rank/rank_filter/collapse_adjacent.rb', line 74

def filter!(ranks, original_text:, **_)
  TokenCollapser.new(tokens: ranks, text: original_text, **@options).collapse
end