Class: NewsCrawler::Storage::URLQueue::URLQueueEngine
- Inherits:
-
Object
- Object
- NewsCrawler::Storage::URLQueue::URLQueueEngine
- Defined in:
- lib/news_crawler/storage/url_queue/url_queue_engine.rb
Overview
Basic class for URLQueue engine. Subclass and implement all its method to create new URLQueue engine, you should keep methods’ singature unchanged
Direct Known Subclasses
Class Method Summary collapse
-
.get_engines ⇒ Array
Get engine list.
- .inherited(klass) ⇒ Object
Instance Method Summary collapse
-
#add(url, ref_url = '') ⇒ Object
Add url with reference url.
-
#all ⇒ Array
Get all url with status.
-
#clear ⇒ Fixnum
Clear URLQueue.
-
#find_all(module_name, state, max_depth = -1)) ⇒ Array
Find all visited urls with module’s state.
-
#find_one(module_name, state, max_depth = -1)) ⇒ String?
Find one visited url with given module process state.
-
#find_unvisited(max_depth = -1)) ⇒ Array
Get list of unvisited URL.
-
#mark(module_name, url, state) ⇒ Object
Set processing state of url in given module.
-
#mark_all(module_name, new_state, orig_state = nil) ⇒ Object
Change all url in an state to other state.
-
#mark_all_unvisited ⇒ Object
Mark all URLs as unvisited.
-
#mark_visited(url) ⇒ Object
Mark an URL as visited.
-
#next_unprocessed(module_name) ⇒ String?
Produce next unprocessed url and mark it as processing.
Class Method Details
.get_engines ⇒ Array
Get engine list
36 37 38 39 40 41 42 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 36 def self.get_engines @engine_list = @engine_list || [] @engine_list.inject({}) do | memo, klass | memo[klass::NAME.intern] = klass memo end end |
.inherited(klass) ⇒ Object
30 31 32 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 30 def self.inherited(klass) @engine_list = (@engine_list || []) + [klass] end |
Instance Method Details
#add(url, ref_url = '') ⇒ Object
Add url with reference url
95 96 97 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 95 def add(url, ref_url = '') raise NotImplementedError end |
#all ⇒ Array
Get all url with status
118 119 120 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 118 def all raise NotImplementedError end |
#clear ⇒ Fixnum
Clear URLQueue
101 102 103 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 101 def clear raise NotImplementedError end |
#find_all(module_name, state, max_depth = -1)) ⇒ Array
Find all visited urls with module’s state
72 73 74 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 72 def find_all(module_name, state, max_depth = -1) raise NotImplementedError end |
#find_one(module_name, state, max_depth = -1)) ⇒ String?
Find one visited url with given module process state
81 82 83 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 81 def find_one(module_name, state, max_depth = -1) raise NotImplementedError end |
#find_unvisited(max_depth = -1)) ⇒ Array
Get list of unvisited URL
88 89 90 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 88 def find_unvisited(max_depth = -1) raise NotImplementedError end |
#mark(module_name, url, state) ⇒ Object
Set processing state of url in given module
48 49 50 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 48 def mark(module_name, url, state) raise NotImplementedError end |
#mark_all(module_name, new_state, orig_state = nil) ⇒ Object
Change all url in an state to other state
56 57 58 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 56 def mark_all(module_name, new_state, orig_state = nil) raise NotImplementedError end |
#mark_all_unvisited ⇒ Object
Mark all URLs as unvisited
112 113 114 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 112 def mark_all_unvisited raise NotImplementedError end |
#mark_visited(url) ⇒ Object
Mark an URL as visited
107 108 109 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 107 def mark_visited(url) raise NotImplementedError end |
#next_unprocessed(module_name) ⇒ String?
Produce next unprocessed url and mark it as processing
63 64 65 |
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 63 def next_unprocessed(module_name) raise NotImplementedError end |