Class: Anemone::PageHash
- Inherits:
-
Hash
- Object
- Hash
- Anemone::PageHash
- Defined in:
- lib/anemone/page_hash.rb
Instance Method Summary collapse
-
#[](index) ⇒ Object
We typically index the hash with a URI, but convert it to a String for easier retrieval.
- #[]=(index, other) ⇒ Object
- #has_key?(key) ⇒ Boolean
-
#has_page?(url) ⇒ Boolean
Does this PageHash contain the specified URL? HTTP and HTTPS versions of a URL are considered to be the same page.
-
#pages_linking_to(urls) ⇒ Object
If given a single URL (as a String or URI), returns an Array of Pages which link to that URL If given an Array of URLs, returns a Hash (URI => [Page, Page…]) of Pages linking to those URLs.
-
#shortest_paths!(root) ⇒ Object
Use a breadth-first search to calculate the single-source shortest paths from root to all pages in the PageHash.
-
#uniq ⇒ Object
Returns a new PageHash by removing redirect-aliases for each non-redirect Page.
-
#urls_linking_to(urls) ⇒ Object
If given a single URL (as a String or URI), returns an Array of URLs which link to that URL If given an Array of URLs, returns a Hash (URI => [URI, URI…]) of URLs linking to those URLs.
Instance Method Details
#[](index) ⇒ Object
We typically index the hash with a URI, but convert it to a String for easier retrieval
6 7 8 |
# File 'lib/anemone/page_hash.rb', line 6 def [](index) super(index.to_s) end |
#[]=(index, other) ⇒ Object
10 11 12 |
# File 'lib/anemone/page_hash.rb', line 10 def []=(index, other) super(index.to_s, other) end |
#has_key?(key) ⇒ Boolean
14 15 16 |
# File 'lib/anemone/page_hash.rb', line 14 def has_key?(key) super(key.to_s) end |
#has_page?(url) ⇒ Boolean
Does this PageHash contain the specified URL? HTTP and HTTPS versions of a URL are considered to be the same page.
20 21 22 23 24 25 26 27 28 |
# File 'lib/anemone/page_hash.rb', line 20 def has_page?(url) schemes = %w(http https) if schemes.include? url.scheme u = url.dup return schemes.any? { |s| u.scheme = s; has_key?(u) } end has_key?(url) end |
#pages_linking_to(urls) ⇒ Object
If given a single URL (as a String or URI), returns an Array of Pages which link to that URL If given an Array of URLs, returns a Hash (URI => [Page, Page…]) of Pages linking to those URLs
93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/anemone/page_hash.rb', line 93 def pages_linking_to(urls) unless urls.is_a?(Array) urls = [urls] unless urls.is_a?(Array) single = true end urls.map! do |url| if url.is_a?(String) URI(url) rescue nil else url end end urls.compact links = {} urls.each { |url| links[url] = [] } values.each do |page| urls.each { |url| links[url] << page if page.links.include?(url) } end if single and !links.empty? return links.first else return links end end |
#shortest_paths!(root) ⇒ Object
Use a breadth-first search to calculate the single-source shortest paths from root to all pages in the PageHash
34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 |
# File 'lib/anemone/page_hash.rb', line 34 def shortest_paths!(root) root = URI(root) if root.is_a?(String) raise "Root node not found" if !has_key?(root) each_value {|p| p.visited = false if p} q = Queue.new q.enq(root) self[root].depth = 0 self[root].visited = true while(!q.empty?) url = q.deq next if !has_key?(url) page = self[url] page.links.each do |u| next if !has_key?(u) or self[u].nil? link = self[u] aliases = [link].concat(link.aliases.map {|a| self[a] }) aliases.each do |node| if node.depth.nil? or page.depth + 1 < node.depth node.depth = page.depth + 1 end end q.enq(self[u].url) if !self[u].visited self[u].visited = true end end self end |
#uniq ⇒ Object
Returns a new PageHash by removing redirect-aliases for each non-redirect Page
75 76 77 78 79 80 81 82 83 84 85 86 87 |
# File 'lib/anemone/page_hash.rb', line 75 def uniq results = PageHash.new each do |url, page| #if none of the aliases of this page have been added, and this isn't a redirect page, add this page page_added = page.aliases.inject(false) { |r, a| r ||= results.has_key? a} if !page.redirect? and !page_added results[url] = page.clone results[url].aliases = [] end end results end |
#urls_linking_to(urls) ⇒ Object
If given a single URL (as a String or URI), returns an Array of URLs which link to that URL If given an Array of URLs, returns a Hash (URI => [URI, URI…]) of URLs linking to those URLs
125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 |
# File 'lib/anemone/page_hash.rb', line 125 def urls_linking_to(urls) unless urls.is_a?(Array) urls = [urls] unless urls.is_a?(Array) single = true end links = pages_linking_to(urls) links.each { |url, pages| links[url] = pages.map{|p| p.url} } if single and !links.empty? return links.first else return links end end |