Class: Wgit::Database::InMemory
- Inherits:
-
DatabaseAdapter
- Object
- DatabaseAdapter
- Wgit::Database::InMemory
- Defined in:
- lib/wgit/database/adapters/in_memory.rb
Overview
Database implementer class for in-memory (RAM) storage. This DB is mainly used for testing and experimenting with. This DB is thread safe.
Constant Summary
Constants inherited from DatabaseAdapter
Constants included from Assertable
Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::MIXED_ENUMERABLE_MSG, Assertable::NON_ENUMERABLE_MSG
Instance Method Summary collapse
-
#bulk_upsert(objs) ⇒ Integer
Bulk upserts the objects in the in-memory database collection.
-
#doc_hashes ⇒ Object
The raw doc Hashes, not mapped into their corresponding Wgit objects.
-
#docs(&block) ⇒ Object
The Wgit::Document's collection stored as an in-memory Concurrent::Array.
-
#empty ⇒ Integer
Deletes everything in the urls and documents collections.
-
#initialize(connection_string = nil) ⇒ InMemory
constructor
Initializes a thread safe InMemory Database instance.
-
#inspect ⇒ String
Overrides String#inspect to display collection sizes.
-
#search(query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0) {|doc| ... } ⇒ Array<Wgit::Document>
Searches the database's Document#text for the given query.
-
#size ⇒ Integer
Returns the current size of the in-memory database.
-
#uncrawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returns Url records that haven't yet been crawled.
-
#upsert(obj) ⇒ Boolean
Inserts or updates the object in the in-memory database.
-
#url_hashes ⇒ Object
The raw url Hashes, not mapped into their corresponding Wgit objects.
-
#urls(&block) ⇒ Object
The Wgit::Url's collection stored as an in-memory Concurrent::Array.
Methods included from Assertable
#assert_arr_types, #assert_common_arr_types, #assert_required_keys, #assert_respond_to, #assert_types
Constructor Details
#initialize(connection_string = nil) ⇒ InMemory
Initializes a thread safe InMemory Database instance.
15 16 17 18 19 20 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 15 def initialize(connection_string = nil) # Inits @urls and @docs vars. initialize_store super end |
Instance Method Details
#bulk_upsert(objs) ⇒ Integer
Bulk upserts the objects in the in-memory database collection. You cannot mix collection objs types, all must be Urls or Documents.
152 153 154 155 156 157 158 159 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 152 def bulk_upsert(objs) assert_common_arr_types(objs, [Wgit::Url, Wgit::Document]) objs.reduce(0) do |inserted, obj| inserted += 1 if upsert(obj) inserted end end |
#doc_hashes ⇒ Object
The raw doc Hashes, not mapped into their corresponding Wgit objects.
46 47 48 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 46 def doc_hashes @docs end |
#docs(&block) ⇒ Object
The Wgit::Document's collection stored as an in-memory Concurrent::Array.
36 37 38 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 36 def docs(&block) map_documents(@docs, &block) end |
#empty ⇒ Integer
Deletes everything in the urls and documents collections.
108 109 110 111 112 113 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 108 def empty previous_size = @urls.size + @docs.size initialize_store previous_size end |
#inspect ⇒ String
Overrides String#inspect to display collection sizes.
25 26 27 28 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 25 def inspect "#<Wgit::Database::InMemory num_urls=#{@urls.size} \ num_docs=#{@docs.size} size=#{size}>" end |
#search(query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0) {|doc| ... } ⇒ Array<Wgit::Document>
Searches the database's Document#text for the given query. The returned Documents are sorted for relevance, starting with the most relevant. Each Document's #score value will be set accordingly.
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 73 def search( query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0, &block ) regex = Wgit::Utils.build_search_regex( query, case_sensitive:, whole_sentence:) # Search the Wgit::Document's, not the raw Hashes. results = docs.select do |doc| score = 0 doc.search(regex, case_sensitive:, whole_sentence:) do |results_hash| score = results_hash.values.sum end next false if score.zero? doc.instance_variable_set :@score, score true end return [] if results.empty? results = results.sort_by { |doc| -doc.score } results = results[skip..] return [] unless results results = results[0...limit] if limit.positive? results.each(&block) if block_given? results end |
#size ⇒ Integer
Returns the current size of the in-memory database. An empty database will return a size of 4 because there are 4 bytes in two empty arrays (urls and docs collections).
55 56 57 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 55 def size @urls.to_s.size + @docs.to_s.size end |
#uncrawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>
Returns Url records that haven't yet been crawled.
121 122 123 124 125 126 127 128 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 121 def uncrawled_urls(limit: 0, skip: 0, &block) uncrawled = @urls.reject { |url| url["crawled"] } uncrawled = uncrawled[skip..] return [] unless uncrawled uncrawled = uncrawled[0...limit] if limit.positive? map_urls(uncrawled, &block) end |
#upsert(obj) ⇒ Boolean
Inserts or updates the object in the in-memory database.
134 135 136 137 138 139 140 141 142 143 144 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 134 def upsert(obj) collection, index, model = get_model_info(obj) if index collection[index] = model false else collection << model true end end |
#url_hashes ⇒ Object
The raw url Hashes, not mapped into their corresponding Wgit objects.
41 42 43 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 41 def url_hashes @urls end |
#urls(&block) ⇒ Object
The Wgit::Url's collection stored as an in-memory Concurrent::Array.
31 32 33 |
# File 'lib/wgit/database/adapters/in_memory.rb', line 31 def urls(&block) map_urls(@urls, &block) end |