Class: KeywordLinker

Inherits:
Object
  • Object
show all
Defined in:
lib/keyword_linker.rb

Overview

Given a set of keywords and url’s, and optionally HTML attributes to set on links, takes text and adds hyperlinks from the specified keywords to their associated URL’s. Example:

linker = KeywordLinker.new
linker.add_url('http://www.latimes.com', 'Los Angeles Times')
linker.link_text("Let's check out the Los Angeles Times!")
=> "Let's check out the <a href=\"http://www.latimes.com\">Los Angeles Times</a>!"

KeywordLinker depends on hpricot for parsing HTML. This is done to prevent hyperlinks from being added inside of other hyperlinks and inside of attribute text.

Constant Summary collapse

@@blacklist_strategy =
Object.new

Instance Method Summary collapse

Constructor Details

#initialize(*lookups) ⇒ KeywordLinker

Takes an optional array of lookup objects. A lookup object is anything that responds to the process method and returns an array of Match objects, including KeywordLinker, KeywordProspector, and LookupChain objects. If multiple objects are specified, a LookupChain is created that gives highest priority to matches from objects closer to the end of the array.



38
39
40
41
42
43
44
# File 'lib/keyword_linker.rb', line 38

def initialize(*lookups)
  @tree_initialized=true

  if(lookups)
    @lookup = LookupChain.new(lookups)
  end
end

Instance Method Details

#add_url(url, keyword, html_attributes = {}) ⇒ Object

Takes a url and a keyword String or Array of keywords, and adds it to the tree of keywords in the KeywordLinker. Takes an optional hash of html attributes to be associated with this url.

Only the first occurrence of the url will be linked. If multiple keywords are specified, then only the first occurrence of any of the keywords is linked to the target url. ie, if multiple keywords match for this url, only one instance of one keyword will be linked.



54
55
56
57
58
59
60
61
# File 'lib/keyword_linker.rb', line 54

def add_url(url, keyword, html_attributes={})
  init_lookup

  strategy = HyperlinkStrategy.new(url, html_attributes)
  strategy.keywords = keyword

  @dl.add(strategy)
end

#blacklist_keyword(keyword) ⇒ Object

Blacklist this keyword or array of keywords. If a keyword is blacklisted, it will not be linked. For example, if the “Los Angeles” part of “Los Angeles Times” is getting linked, you can blacklist “Los Angeles Times” to keep it from being linked.



67
68
69
70
71
# File 'lib/keyword_linker.rb', line 67

def blacklist_keyword(keyword)
  init_lookup

  @dl.add(keyword, @@blacklist_strategy)
end

#init_treeObject

Initialize the tree after all url’s have been added. This needs to be called once. If you don’t call init_tree, it will be called automatically on the first call to the process or link_text method. You may find this annoying or inconvenient if it happens on the first request to your application and you’ve constructed a large set of links. Adding url’s after calling init_tree, process, or link_text is not supported.



79
80
81
82
83
84
# File 'lib/keyword_linker.rb', line 79

def init_tree
  unless @tree_initialized
    @dl.construct_fail
    @tree_initialized = true
  end
end

Adds links to known url’s into the text provided. Only the first instance of each keyword or set of keywords associated to a url is linked. In cases of overlap, the longest keyword is chosen to resolve the overlap.



89
90
91
92
93
94
95
96
97
98
99
# File 'lib/keyword_linker.rb', line 89

def link_text(text)
  init_tree unless @tree_initialized

  linked_outputs = Set.new

  htext = Hpricot(text)

  link_text_in_elem(htext, linked_outputs)

  return htext.to_s
end

#process(text) ⇒ Object

Returns an array of matches in the specified text. Doesn’t filter overlaps or parse HTML to prevent matches in attribute text or inside of existing hyperlinks. Primarily for internal use.



104
105
106
107
108
# File 'lib/keyword_linker.rb', line 104

def process(text)
  init_tree unless @tree_initialized

  @lookup.process(text)
end