Class: KeywordSearcher

Inherits:
Object
  • Object
show all
Defined in:
lib/ppr/keyword_searcher.rb

Overview

Tool for looking for keywords within a string.

Instance Method Summary collapse

Constructor Details

#initialize(separator = "") ⇒ KeywordSearcher

Creates a new keyword searcher, where words are between seperators regular expressions.


11
12
13
14
15
16
17
18
19
20
# File 'lib/ppr/keyword_searcher.rb', line 11

def initialize(separator = "")
    # Checks and set the separator.
    @separator = Regexp.new(separator).to_s
    # Initialize the inner map.
    @map = {}
    # Initialize the list of keywords
    @keywords = []
    # Initialize the keyword extraction regular expression
    @keyword_extract = //
end

Instance Method Details

#[](keyword) ⇒ Object

Get the object corresponding to keyword.


46
47
48
# File 'lib/ppr/keyword_searcher.rb', line 46

def [](keyword)
    return @map[keyword.to_s]
end

#[]=(keyword, object) ⇒ Object

Adds a keyword to the searcher associated with an object.


30
31
32
33
34
35
36
37
38
39
40
41
42
43
# File 'lib/ppr/keyword_searcher.rb', line 30

def []=(keyword,object)
    # Ensure the keyword is a valid string.
    keyword = keyword.to_str
    unless /^[A-Za-z_]\w*$/.match(keyword)
        raise "Invalid string for a keyword: #{keyword}." 
    end
    # Update the map.
    @map[keyword] = object
    # Get the keywords sorted in reverse order (used for building the
    # searching regular expressions).
    @keywords = @map.keys.sort!.reverse!
    # Update the searching regular expression.
    @keyword_extract = Regexp.new(@keywords.join("|"))
end

#each_in(text) ⇒ Object

Search each keyword inside a text and apply the block on the corresponding objects if found with the range in the string where it has been found.

Returns an enumerator if no block is given.

NOTE: keywords included into a longer one are ignored.


89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
# File 'lib/ppr/keyword_searcher.rb', line 89

def each_in(text)
    return to_enum(:each_in,text) unless block_given?
    # Check and clone the text to avoid side effects.
    text = text.to_s.clone
    # Look for a first keyword.
    macro,range = find(text)
    while macro do
        # Delete the range from the text.
        text[range] = " " * (range.last-range.first+1)
        # Apply the block
        yield(macro,range)
        # Look for the next macro if any
        # print "text = #{text}\n"
        macro,range = find(text)
    end
end

#find(text, skip = []) ⇒ Object

Search a keyword inside a text and return the corresponding object if found with the range in the string where it has been found.

If a keyword is in skip it s ignored.

NOTE: the first found object is returned.


56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/ppr/keyword_searcher.rb', line 56

def find(text,skip = [])
    # print "skip=#{skip} @keywords=#{@keywords}\n"
    # Compute the regular expression for finding the keywords.
    rexp = Regexp.new( (@keywords - skip).map! do |k|
        @separator + k + @separator
    end.join("|") )
    # print "find with @rexp=#{@rexp}\n"
    # Look for the first keyword.
    matched = rexp.match(text)
    # Isolate the keyword from the separators.
    # found = @keywords.match(matched.to_s)
    found = @keyword_extract.match(matched.to_s)
    if found then
        found = found.to_s
        # A keyword is found, adjust the range and 
        # return it with the corresponding object.
        range = matched.offset(0)
        range[0] += matched.to_s.index(found)
        range[1] = range[0] + found.size - 1
        return [ @map[found], range[0]..range[1] ]
    else
        # A keyword is not found.
        return nil
    end
end

#to_hObject

Converts to an hash: actually return self.

NOTE: for duck typing purpose.


25
26
27
# File 'lib/ppr/keyword_searcher.rb', line 25

def to_h
    return self
end