Class: Wgit::Database::InMemory

Inherits:
DatabaseAdapter show all
Defined in:
lib/wgit/database/adapters/in_memory.rb

Overview

Database implementer class for in-memory (RAM) storage. This DB is mainly used for testing and experimenting with. This DB is thread safe.

Constant Summary

Constants inherited from DatabaseAdapter

DatabaseAdapter::NOT_IMPL_ERR

Constants included from Assertable

Assertable::DEFAULT_DUCK_FAIL_MSG, Assertable::DEFAULT_REQUIRED_KEYS_MSG, Assertable::DEFAULT_TYPE_FAIL_MSG, Assertable::MIXED_ENUMERABLE_MSG, Assertable::NON_ENUMERABLE_MSG

Instance Method Summary collapse

Methods included from Assertable

#assert_arr_types, #assert_common_arr_types, #assert_required_keys, #assert_respond_to, #assert_types

Constructor Details

#initialize(connection_string = nil) ⇒ InMemory

Initializes a thread safe InMemory Database instance.

Parameters:

  • connection_string (String) (defaults to: nil)

    Not used but needed to adhere to the DatabaseAdapter interface.



15
16
17
18
19
20
# File 'lib/wgit/database/adapters/in_memory.rb', line 15

def initialize(connection_string = nil)
  # Inits @urls and @docs vars.
  initialize_store

  super
end

Instance Method Details

#bulk_upsert(objs) ⇒ Integer

Bulk upserts the objects in the in-memory database collection. You cannot mix collection objs types, all must be Urls or Documents.

Parameters:

Returns:

  • (Integer)

    The total number of newly inserted objects.



152
153
154
155
156
157
158
159
# File 'lib/wgit/database/adapters/in_memory.rb', line 152

def bulk_upsert(objs)
  assert_common_arr_types(objs, [Wgit::Url, Wgit::Document])

  objs.reduce(0) do |inserted, obj|
    inserted += 1 if upsert(obj)
    inserted
  end
end

#doc_hashesObject

The raw doc Hashes, not mapped into their corresponding Wgit objects.



46
47
48
# File 'lib/wgit/database/adapters/in_memory.rb', line 46

def doc_hashes
  @docs
end

#docs(&block) ⇒ Object

The Wgit::Document's collection stored as an in-memory Concurrent::Array.



36
37
38
# File 'lib/wgit/database/adapters/in_memory.rb', line 36

def docs(&block)
  map_documents(@docs, &block)
end

#emptyInteger

Deletes everything in the urls and documents collections.

Returns:

  • (Integer)

    The number of deleted records.



108
109
110
111
112
113
# File 'lib/wgit/database/adapters/in_memory.rb', line 108

def empty
  previous_size = @urls.size + @docs.size
  initialize_store

  previous_size
end

#inspectString

Overrides String#inspect to display collection sizes.

Returns:

  • (String)

    A short textual representation of this object.



25
26
27
28
# File 'lib/wgit/database/adapters/in_memory.rb', line 25

def inspect
  "#<Wgit::Database::InMemory num_urls=#{@urls.size} \
num_docs=#{@docs.size} size=#{size}>"
end

#search(query, case_sensitive: false, whole_sentence: true, limit: 10, skip: 0) {|doc| ... } ⇒ Array<Wgit::Document>

Searches the database's Document#text for the given query. The returned Documents are sorted for relevance, starting with the most relevant. Each Document's #score value will be set accordingly.

Parameters:

  • query (Regexp, #to_s)

    The regex or text value to search each document's @text for.

  • case_sensitive (Boolean) (defaults to: false)

    Whether character case must match.

  • whole_sentence (Boolean) (defaults to: true)

    Whether multiple words should be searched for separately.

  • limit (Integer) (defaults to: 10)

    The max number of results to return.

  • skip (Integer) (defaults to: 0)

    The number of results to skip.

Yields:

  • (doc)

    Given each search result (Wgit::Document) returned from the DB.

Returns:

  • (Array<Wgit::Document>)

    The search results obtained from the DB.



73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
# File 'lib/wgit/database/adapters/in_memory.rb', line 73

def search(
  query, case_sensitive: false, whole_sentence: true,
  limit: 10, skip: 0, &block
)
  regex = Wgit::Utils.build_search_regex(
    query, case_sensitive:, whole_sentence:)

  # Search the Wgit::Document's, not the raw Hashes.
  results = docs.select do |doc|
    score = 0
    doc.search(regex, case_sensitive:, whole_sentence:) do |results_hash|
      score = results_hash.values.sum
    end
    next false if score.zero?

    doc.instance_variable_set :@score, score
    true
  end

  return [] if results.empty?

  results = results.sort_by { |doc| -doc.score }

  results = results[skip..]
  return [] unless results

  results = results[0...limit] if limit.positive?
  results.each(&block) if block_given?

  results
end

#sizeInteger

Returns the current size of the in-memory database. An empty database will return a size of 4 because there are 4 bytes in two empty arrays (urls and docs collections).

Returns:

  • (Integer)

    The current size of the in-memory DB.



55
56
57
# File 'lib/wgit/database/adapters/in_memory.rb', line 55

def size
  @urls.to_s.size + @docs.to_s.size
end

#uncrawled_urls(limit: 0, skip: 0) {|url| ... } ⇒ Array<Wgit::Url>

Returns Url records that haven't yet been crawled.

Parameters:

  • limit (Integer) (defaults to: 0)

    The max number of Url's to return. 0 returns all.

  • skip (Integer) (defaults to: 0)

    Skip n amount of Url's.

Yields:

  • (url)

    Given each Url object (Wgit::Url) returned from the DB.

Returns:

  • (Array<Wgit::Url>)

    The uncrawled Urls obtained from the DB.



121
122
123
124
125
126
127
128
# File 'lib/wgit/database/adapters/in_memory.rb', line 121

def uncrawled_urls(limit: 0, skip: 0, &block)
  uncrawled = @urls.reject { |url| url["crawled"] }
  uncrawled = uncrawled[skip..]
  return [] unless uncrawled

  uncrawled = uncrawled[0...limit] if limit.positive?
  map_urls(uncrawled, &block)
end

#upsert(obj) ⇒ Boolean

Inserts or updates the object in the in-memory database.

Parameters:

Returns:

  • (Boolean)

    True if inserted, false if updated.



134
135
136
137
138
139
140
141
142
143
144
# File 'lib/wgit/database/adapters/in_memory.rb', line 134

def upsert(obj)
  collection, index, model = get_model_info(obj)

  if index
    collection[index] = model
    false
  else
    collection << model
    true
  end
end

#url_hashesObject

The raw url Hashes, not mapped into their corresponding Wgit objects.



41
42
43
# File 'lib/wgit/database/adapters/in_memory.rb', line 41

def url_hashes
  @urls
end

#urls(&block) ⇒ Object

The Wgit::Url's collection stored as an in-memory Concurrent::Array.



31
32
33
# File 'lib/wgit/database/adapters/in_memory.rb', line 31

def urls(&block)
  map_urls(@urls, &block)
end