Class: TextRank::TokenFilter::PartOfSpeech

Inherits:
Object
  • Object
show all
Defined in:
lib/text_rank/token_filter/part_of_speech.rb

Overview

Token filter to keep only a selected set of parts of speech

= Example

PartOfSpeech.new(parts_to_keep: %w[nn nns]).filter!(%w[ all men are by nature free ]) => ["men", "nature"]

Instance Method Summary collapse

Constructor Details

#initialize(parts_to_keep: %w[nn nnp nnps nns jj jjr jjs vb vbd vbg vbn vbp vbz], **_) ⇒ PartOfSpeech

Returns a new instance of PartOfSpeech.

Parameters:

  • parts_to_keep (Array<String>) (defaults to: %w[nn nnp nnps nns jj jjr jjs vb vbd vbg vbn vbp vbz])

    list of engtagger parts of speech to keep

See Also:



19
20
21
22
23
# File 'lib/text_rank/token_filter/part_of_speech.rb', line 19

def initialize(parts_to_keep: %w[nn nnp nnps nns jj jjr jjs vb vbd vbg vbn vbp vbz], **_)
  @parts_to_keep = Set.new(parts_to_keep)
  @eng_tagger = EngTagger.new
  @last_pos_tag = 'pp'
end

Instance Method Details

#filter!(tokens) ⇒ Array<String>

Perform the filter

Parameters:

  • tokens (Array<String>)

Returns:

  • (Array<String>)


28
29
30
31
32
# File 'lib/text_rank/token_filter/part_of_speech.rb', line 28

def filter!(tokens)
  tokens.keep_if do |token|
    @parts_to_keep.include?(pos_tag(token))
  end
end