Class: NewsCrawler::Storage::URLQueue::URLQueueEngine

Inherits:
Object
  • Object
show all
Defined in:
lib/news_crawler/storage/url_queue/url_queue_engine.rb

Overview

Basic class for URLQueue engine. Subclass and implement all its method to create new URLQueue engine, you should keep methods’ singature unchanged

Direct Known Subclasses

MongoEngine

Class Method Summary collapse

Instance Method Summary collapse

Class Method Details

.get_enginesArray

Get engine list

Returns:

  • (Array)

    list of url queue engines



36
37
38
39
40
41
42
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 36

def self.get_engines
  @engine_list = @engine_list || []
  @engine_list.inject({}) do | memo, klass |
    memo[klass::NAME.intern] = klass
    memo
  end
end

.inherited(klass) ⇒ Object



30
31
32
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 30

def self.inherited(klass)
  @engine_list = (@engine_list || []) + [klass]
end

Instance Method Details

#add(url, ref_url = '') ⇒ Object

Add url with reference url

Parameters:

  • url (String)

    URL

  • ref_url (String) (defaults to: '')

    reference URL

Raises:

  • (NotImplementedError)


95
96
97
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 95

def add(url, ref_url = '')
  raise NotImplementedError
end

#allArray

Get all url with status

Returns:

  • (Array)

    URL list

Raises:

  • (NotImplementedError)


118
119
120
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 118

def all
  raise NotImplementedError
end

#clearFixnum

Clear URLQueue

Returns:

  • (Fixnum)

    number of urls removed

Raises:

  • (NotImplementedError)


101
102
103
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 101

def clear
  raise NotImplementedError
end

#find_all(module_name, state, max_depth = -1)) ⇒ Array

Find all visited urls with module’s state

Parameters:

  • module_name (String)
  • state (String)
  • max_depth (Fixnum) (defaults to: -1))

    max url depth return (inclusive)

Returns:

  • (Array)

    URL list

Raises:

  • (NotImplementedError)


72
73
74
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 72

def find_all(module_name, state, max_depth = -1)
  raise NotImplementedError
end

#find_one(module_name, state, max_depth = -1)) ⇒ String?

Find one visited url with given module process state

Parameters:

  • module_name (String)
  • state (String)

    one of unprocessed, processing, processed

  • max_depth (Fixnum) (defaults to: -1))

    max url depth return (inclusive)

Returns:

  • (String, nil)

    URL

Raises:

  • (NotImplementedError)


81
82
83
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 81

def find_one(module_name, state, max_depth = -1)
  raise NotImplementedError
end

#find_unvisited(max_depth = -1)) ⇒ Array

Get list of unvisited URL

Parameters:

  • max_depth (Fixnum) (defaults to: -1))

    maximum depth of url return

Returns:

  • (Array)

    unvisited url with maximum depth (option)

Raises:

  • (NotImplementedError)


88
89
90
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 88

def find_unvisited(max_depth = -1)
  raise NotImplementedError
end

#mark(module_name, url, state) ⇒ Object

Set processing state of url in given module

Parameters:

  • module_name (String)
  • url (String)
  • state (String)

    one of unprocessed, processing, processed

Raises:

  • (NotImplementedError)


48
49
50
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 48

def mark(module_name, url, state)
  raise NotImplementedError
end

#mark_all(module_name, new_state, orig_state = nil) ⇒ Object

Change all url in an state to other state

Parameters:

  • module_name (String)
  • new_state (String)

    new state

  • orig_state (String) (defaults to: nil)

    original state

Raises:

  • (NotImplementedError)


56
57
58
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 56

def mark_all(module_name, new_state, orig_state = nil)
  raise NotImplementedError
end

#mark_all_unvisitedObject

Mark all URLs as unvisited

Raises:

  • (NotImplementedError)


112
113
114
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 112

def mark_all_unvisited
  raise NotImplementedError
end

#mark_visited(url) ⇒ Object

Mark an URL as visited

Parameters:

  • url (String)

Raises:

  • (NotImplementedError)


107
108
109
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 107

def mark_visited(url)
  raise NotImplementedError
end

#next_unprocessed(module_name) ⇒ String?

Produce next unprocessed url and mark it as processing

Parameters:

  • module_name (String)

Returns:

  • (String, nil)

Raises:

  • (NotImplementedError)


63
64
65
# File 'lib/news_crawler/storage/url_queue/url_queue_engine.rb', line 63

def next_unprocessed(module_name)
  raise NotImplementedError
end