Module: NewsCrawler::CrawlerModule
- Defined in:
- lib/news_crawler/crawler_module.rb
Overview
Include this to get basic module methods
Instance Method Summary collapse
-
#find_all(state, max_depth = -1)) ⇒ Array
Find one visited url with given current module process state.
-
#find_one(state, max_depth = -1)) ⇒ String?
Find all visited urls with current module’s state.
-
#find_unprocessed(max_depth = -1)) ⇒ Array
Find all visited unprocessed url.
-
#load_yaml(key, value) ⇒ Object?
Load YAML object.
- #mark_all_as_unprocessed ⇒ Object
-
#mark_processed(url) ⇒ Object
Mark current url process state of current module is processed.
-
#mark_unprocessed(url) ⇒ Object
Mark current url process state of current module is unprocessed.
-
#next_unprocessed(max_depth = -1)) ⇒ String?
Get next unprocessed a url and mark it as processing in atomic.
-
#save_yaml(key, value) ⇒ Object
Serialize object to YAML and save it (overwrite if key existed).
Instance Method Details
#find_all(state, max_depth = -1)) ⇒ Array
Find one visited url with given current module process state
53 54 55 |
# File 'lib/news_crawler/crawler_module.rb', line 53 def find_all(state, max_depth = -1) URLQueue.find_all(self.class.name, state, max_depth) end |
#find_one(state, max_depth = -1)) ⇒ String?
Find all visited urls with current module’s state
61 62 63 |
# File 'lib/news_crawler/crawler_module.rb', line 61 def find_one(state, max_depth = -1) URLQueue.find_one(self.class.name, state, max_depth) end |
#find_unprocessed(max_depth = -1)) ⇒ Array
Find all visited unprocessed url
45 46 47 |
# File 'lib/news_crawler/crawler_module.rb', line 45 def find_unprocessed(max_depth = -1) URLQueue.find_all(self.class.name, URLQueue::UNPROCESSED, max_depth) end |
#load_yaml(key, value) ⇒ Object?
Load YAML object
86 87 88 |
# File 'lib/news_crawler/crawler_module.rb', line 86 def load_yaml(key, value) YAMLStor.get(self.class.name, key, value) end |
#mark_all_as_unprocessed ⇒ Object
72 73 74 |
# File 'lib/news_crawler/crawler_module.rb', line 72 def mark_all_as_unprocessed URLQueue.mark_all(self.class.name, URLQueue::UNPROCESSED) end |
#mark_processed(url) ⇒ Object
Mark current url process state of current module is processed
32 33 34 |
# File 'lib/news_crawler/crawler_module.rb', line 32 def mark_processed(url) URLQueue.mark(self.class.name, url, URLQueue::PROCESSED) end |
#mark_unprocessed(url) ⇒ Object
Mark current url process state of current module is unprocessed
38 39 40 |
# File 'lib/news_crawler/crawler_module.rb', line 38 def mark_unprocessed(url) URLQueue.mark(self.class.name, url, URLQueue::UNPROCESSED) end |
#next_unprocessed(max_depth = -1)) ⇒ String?
Get next unprocessed a url and mark it as processing in atomic
68 69 70 |
# File 'lib/news_crawler/crawler_module.rb', line 68 def next_unprocessed(max_depth = -1) URLQueue.next_unprocessed(self.class.name, max_depth) end |