Class: Wgit::Database
- Inherits:
-
Object
- Object
- Wgit::Database
- Includes:
- Assertable
- Defined in:
- lib/wgit/database/database.rb
Overview
Class modeling a DB connection and CRUD operations for the Url and Document collections.
Constant Summary
Constants included from Assertable
Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::WRONG_METHOD_MSG
Instance Attribute Summary collapse
-
#client ⇒ Object
readonly
The database client object.
-
#connection_string ⇒ Object
readonly
The connection string for the database.
Class Method Summary collapse
-
.connect(connection_string = nil) ⇒ Wgit::Database
A class alias for Database.new.
-
.establish_connection(connection_string) ⇒ Mong::Client
Initializes a connected database client using the connection string.
Instance Method Summary collapse
-
#crawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returns Url records that have been crawled.
-
#doc?(doc) ⇒ Boolean
(also: #document?)
Returns whether or not a record with the given doc 'url.url' field (which is unique) exists in the database's 'documents' collection.
-
#initialize(connection_string = nil) ⇒ Database
constructor
Initializes a connected database client using the provided connection_string or ENV['WGIT_CONNECTION_STRING'].
-
#insert(data) ⇒ Object
Insert one or more Url or Document objects into the DB.
-
#insert_docs(data) ⇒ Integer
(also: #insert_doc)
protected
Insert one or more Document objects into the DB.
-
#insert_urls(data) ⇒ Integer
(also: #insert_url)
protected
Insert one or more Url objects into the DB.
-
#num_docs ⇒ Integer
(also: #num_documents)
Returns the total number of Document records in the DB.
-
#num_records ⇒ Integer
(also: #num_objects)
Returns the total number of records (urls + docs) in the DB.
-
#num_urls ⇒ Integer
Returns the total number of URL records in the DB.
-
#search(query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0) {|doc| ... } ⇒ Array<Wgit::Document>
Searches the database's Documents for the given query.
-
#size ⇒ Integer
(also: #count, #length)
Returns the current size of the database.
-
#stats ⇒ BSON::Document#[]#fetch
Returns statistics about the database.
-
#uncrawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returned Url records that haven't yet been crawled.
-
#update(data) ⇒ Object
Update a Url or Document object in the DB.
-
#update_doc(doc) ⇒ Integer
protected
Update a Document record in the DB.
-
#update_url(url) ⇒ Integer
protected
Update a Url record in the DB.
-
#url?(url) ⇒ Boolean
Returns whether or not a record with the given 'url' field (which is unique) exists in the database's 'urls' collection.
-
#urls(crawled: nil, limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returns Url records from the DB.
Methods included from Assertable
#assert_arr_types, #assert_required_keys, #assert_respond_to, #assert_types
Constructor Details
#initialize(connection_string = nil) ⇒ Database
Initializes a connected database client using the provided connection_string or ENV['WGIT_CONNECTION_STRING'].
30 31 32 33 34 35 36 37 |
# File 'lib/wgit/database/database.rb', line 30 def initialize(connection_string = nil) connection_string ||= ENV['WGIT_CONNECTION_STRING'] raise "connection_string and ENV['WGIT_CONNECTION_STRING'] are nil" \ unless connection_string @client = Database.establish_connection(connection_string) @connection_string = connection_string end |
Instance Attribute Details
#client ⇒ Object (readonly)
The database client object. Gets set when a connection is established.
21 22 23 |
# File 'lib/wgit/database/database.rb', line 21 def client @client end |
#connection_string ⇒ Object (readonly)
The connection string for the database.
18 19 20 |
# File 'lib/wgit/database/database.rb', line 18 def connection_string @connection_string end |
Class Method Details
.connect(connection_string = nil) ⇒ Wgit::Database
A class alias for Database.new.
46 47 48 |
# File 'lib/wgit/database/database.rb', line 46 def self.connect(connection_string = nil) new(connection_string) end |
.establish_connection(connection_string) ⇒ Mong::Client
Initializes a connected database client using the connection string.
56 57 58 59 60 61 62 63 64 |
# File 'lib/wgit/database/database.rb', line 56 def self.establish_connection(connection_string) # Only log for error (and more severe) scenarios. Mongo::Logger.logger = Wgit.logger.clone Mongo::Logger.logger.progname = 'mongo' Mongo::Logger.logger.level = Logger::ERROR # Connects to the database here. Mongo::Client.new(connection_string) end |
Instance Method Details
#crawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returns Url records that have been crawled.
122 123 124 |
# File 'lib/wgit/database/database.rb', line 122 def crawled_urls(limit: 0, skip: 0, &block) urls(crawled: true, limit: limit, skip: skip, &block) end |
#doc?(doc) ⇒ Boolean Also known as: document?
Returns whether or not a record with the given doc 'url.url' field (which is unique) exists in the database's 'documents' collection.
233 234 235 236 237 |
# File 'lib/wgit/database/database.rb', line 233 def doc?(doc) assert_type(doc, Wgit::Document) hash = { 'url.url' => doc.url } @client[:documents].find(hash).any? end |
#insert(data) ⇒ Object
Insert one or more Url or Document objects into the DB.
74 75 76 77 78 79 80 81 82 83 84 85 86 |
# File 'lib/wgit/database/database.rb', line 74 def insert(data) data = data.dup # Avoid modifying by reference. type = data.is_a?(Enumerable) ? data.first : data case type when Wgit::Url insert_urls(data) when Wgit::Document insert_docs(data) else raise "Unsupported type - #{data.class}: #{data}" end end |
#insert_docs(data) ⇒ Integer (protected) Also known as: insert_doc
Insert one or more Document objects into the DB.
283 284 285 286 287 288 289 290 291 292 293 |
# File 'lib/wgit/database/database.rb', line 283 def insert_docs(data) if data.respond_to?(:map) assert_arr_types(data, Wgit::Document) data.map! { |doc| Wgit::Model.document(doc) } else assert_types(data, Wgit::Document) data = Wgit::Model.document(data) end create(:documents, data) end |
#insert_urls(data) ⇒ Integer (protected) Also known as: insert_url
Insert one or more Url objects into the DB.
265 266 267 268 269 270 271 272 273 274 275 |
# File 'lib/wgit/database/database.rb', line 265 def insert_urls(data) if data.respond_to?(:map) assert_arr_type(data, Wgit::Url) data.map! { |url| Wgit::Model.url(url) } else assert_type(data, Wgit::Url) data = Wgit::Model.url(data) end create(:urls, data) end |
#num_docs ⇒ Integer Also known as: num_documents
Returns the total number of Document records in the DB.
206 207 208 |
# File 'lib/wgit/database/database.rb', line 206 def num_docs @client[:documents].count end |
#num_records ⇒ Integer Also known as: num_objects
Returns the total number of records (urls + docs) in the DB.
213 214 215 |
# File 'lib/wgit/database/database.rb', line 213 def num_records num_urls + num_docs end |
#num_urls ⇒ Integer
Returns the total number of URL records in the DB.
199 200 201 |
# File 'lib/wgit/database/database.rb', line 199 def num_urls @client[:urls].count end |
#search(query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0) {|doc| ... } ⇒ Array<Wgit::Document>
Searches the database's Documents for the given query.
The searched fields are decided by the text index setup on the documents collection. Currently we search against the following fields: "author", "keywords", "title" and "text" by default.
The MongoDB search algorithm ranks/sorts the results in order (highest first) based on each document's "textScore" (which records the number of query hits). The "textScore" is then stored in each Document result object for use elsewhere if needed; accessed via Wgit::Document#score.
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 |
# File 'lib/wgit/database/database.rb', line 156 def search( query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0 ) query = query.to_s.strip query.replace('"' + query + '"') if whole_sentence # Sort based on the most search hits (aka "textScore"). # We use the sort_proj hash as both a sort and a projection below. sort_proj = { score: { :$meta => 'textScore' } } query = { :$text => { :$search => query, :$caseSensitive => case_sensitive } } results = retrieve(:documents, query, sort: sort_proj, projection: sort_proj, limit: limit, skip: skip) return [] if results.count < 1 # respond_to? :empty? == false # results.respond_to? :map! is false so we use map and overwrite the var. results = results.map { |mongo_doc| Wgit::Document.new(mongo_doc) } results.each { |doc| yield(doc) } if block_given? results end |
#size ⇒ Integer Also known as: count, length
Returns the current size of the database.
192 193 194 |
# File 'lib/wgit/database/database.rb', line 192 def size stats[:dataSize] end |
#stats ⇒ BSON::Document#[]#fetch
Returns statistics about the database.
185 186 187 |
# File 'lib/wgit/database/database.rb', line 185 def stats @client.command(dbStats: 0).documents[0] end |
#uncrawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returned Url records that haven't yet been crawled.
132 133 134 |
# File 'lib/wgit/database/database.rb', line 132 def uncrawled_urls(limit: 0, skip: 0, &block) urls(crawled: false, limit: limit, skip: skip, &block) end |
#update(data) ⇒ Object
Update a Url or Document object in the DB.
245 246 247 248 249 250 251 252 253 254 255 256 |
# File 'lib/wgit/database/database.rb', line 245 def update(data) data = data.dup # Avoid modifying by reference. case data when Wgit::Url update_url(data) when Wgit::Document update_doc(data) else raise "Unsupported type - #{data.class}: #{data}" end end |
#update_doc(doc) ⇒ Integer (protected)
Update a Document record in the DB.
311 312 313 314 315 316 317 |
# File 'lib/wgit/database/database.rb', line 311 def update_doc(doc) assert_type(doc, Wgit::Document) selection = { 'url.url' => doc.url } doc_hash = Wgit::Model.document(doc).merge(Wgit::Model.common_update_data) update = { '$set' => doc_hash } mutate(true, :documents, selection, update) end |
#update_url(url) ⇒ Integer (protected)
Update a Url record in the DB.
299 300 301 302 303 304 305 |
# File 'lib/wgit/database/database.rb', line 299 def update_url(url) assert_type(url, Wgit::Url) selection = { url: url } url_hash = Wgit::Model.url(url).merge(Wgit::Model.common_update_data) update = { '$set' => url_hash } mutate(true, :urls, selection, update) end |
#url?(url) ⇒ Boolean
Returns whether or not a record with the given 'url' field (which is unique) exists in the database's 'urls' collection.
222 223 224 225 226 |
# File 'lib/wgit/database/database.rb', line 222 def url?(url) assert_type(url, String) # This includes Wgit::Url's. hash = { 'url' => url } @client[:urls].find(hash).any? end |
#urls(crawled: nil, limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returns Url records from the DB.
All Urls are sorted by date_added ascending, in other words the first url returned is the first one that was inserted into the DB.
100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/wgit/database/database.rb', line 100 def urls(crawled: nil, limit: 0, skip: 0) query = crawled.nil? ? {} : { crawled: crawled } sort = { date_added: 1 } results = retrieve(:urls, query, sort: sort, projection: {}, limit: limit, skip: skip) return [] if results.count < 1 # results#empty? doesn't exist. # results.respond_to? :map! is false so we use map and overwrite the var. results = results.map { |url_doc| Wgit::Url.new(url_doc) } results.each { |url| yield(url) } if block_given? results end |