Class: Documentrix::Documents::Cache::SQLiteCache

Inherits:
Object
  • Object
show all
Includes:
Common, Enumerable
Defined in:
lib/documentrix/documents/cache/sqlite_cache.rb

Instance Attribute Summary collapse

Attributes included from Common

#prefix

Instance Method Summary collapse

Methods included from Common

#collections, #pre, #unpre

Methods included from Utils::Math

#cosine_similarity, #norm

Constructor Details

#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) ⇒ void

The initialize method sets up the cache by calling super and setting various instance variables.

Parameters:

  • prefix (String)

    the prefix for keys

  • embedding_length (Integer) (defaults to: 1_024)

    the length of the embeddings vector

  • filename (String) (defaults to: ':memory:')

    the name of the SQLite database file or ':memory:' for in-memory.

  • debug (FalseClass, TrueClass) (defaults to: false)

    whether to enable debugging



18
19
20
21
22
23
24
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 18

def initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false)
  super(prefix:)
  @embedding_length = embedding_length
  @filename         = filename
  @debug            = debug
  setup_database(filename)
end

Instance Attribute Details

#embedding_lengthObject (readonly)

length of the embeddings vector



28
29
30
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 28

def embedding_length
  @embedding_length
end

#filenameObject (readonly)

filename for the database, :memory: is in memory



26
27
28
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 26

def filename
  @filename
end

Instance Method Details

#[](key) ⇒ Documentrix::Documents::Record, NilClass

The method retrieves the value associated with the given key from the cache.

Parameters:

  • key (String)

    The key for which to retrieve the value.

Returns:



37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 37

def [](key)
  result = execute(
    %{
      SELECT records.key, records.text, records.norm, records.source,
        records.tags, embeddings.embedding
      FROM records
      INNER JOIN embeddings ON records.embedding_id = embeddings.rowid
      WHERE records.key = ?
    },
    pre(key)
  )&.first or return
  key, text, norm, source, tags, embedding = *result
  embedding = embedding.unpack("f*")
  tags      = Documentrix::Utils::Tags.new(JSON(tags.to_s).to_a, source:)
  convert_value_to_record(key:, text:, norm:, source:, tags:, embedding:)
end

#[]=(key, value) ⇒ Object

The []= method sets the value for a given key by inserting it into the database.

Parameters:

  • key (String)

    the key to set

  • value (Hash, Documentrix::Documents::Record)

    the hash or record containing the text, embedding, and other metadata



60
61
62
63
64
65
66
67
68
69
70
71
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 60

def []=(key, value)
  value = convert_value_to_record(value)
  embedding = value.embedding.pack("f*")
  execute(%{BEGIN})
  execute(%{INSERT INTO embeddings(embedding) VALUES(?)}, [ embedding ])
  embedding_id, = execute(%{ SELECT last_insert_rowid() }).flatten
  execute(%{
    INSERT INTO records(key,text,embedding_id,norm,source,tags)
    VALUES(?,?,?,?,?,?)
  }, [ pre(key), value.text, embedding_id, value.norm, value.source, JSON(value.tags) ])
  execute(%{COMMIT})
end

#clearDocumentrix::Documents::RedisBackedMemoryCache

The clear method deletes all records for prefix prefix from the cache by executing a SQL query.



147
148
149
150
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 147

def clear
  execute(%{DELETE FROM records WHERE key LIKE ?}, [ "#@prefix%" ])
  self
end

#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache

The clear_for_tags method clears the cache for specific tags by deleting records that match those tags and have the prefix prefix.

Parameters:

  • tags (Array<String>, NilClass) (defaults to: nil)

    An array of tag names to clear from the cache or nil for all records

Returns:



131
132
133
134
135
136
137
138
139
140
141
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 131

def clear_for_tags(tags = nil)
  tags = Documentrix::Utils::Tags.new(tags).to_a
  if tags.present?
    records = find_records_for_tags(tags)
    keys = '(%s)' % records.transpose.first.map { "'%s'" % quote(_1) }.join(?,)
    execute(%{DELETE FROM records WHERE key IN #{keys}})
  else
    clear
  end
  self
end

#convert_to_vector(vector) ⇒ Array

The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.

Parameters:

  • vector (Array)

    the input vector

Returns:

  • (Array)

    the (not) converted vector



195
196
197
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 195

def convert_to_vector(vector)
  vector
end

#delete(key) ⇒ NilClass

The delete method removes a key from the cache by executing a SQL query.

Parameters:

  • key (String)

    the key to be deleted

Returns:

  • (NilClass)


91
92
93
94
95
96
97
98
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 91

def delete(key)
  result = key?(key)
  execute(
    %{ DELETE FROM records WHERE records.key = ? },
    pre(key)
  )
  result
end

#each(prefix: "#@prefix%") {|key, value| ... } ⇒ Object

The each method iterates over records matching the given prefix and yields them to the block.

Examples:

cache.each do |key, value|
  puts "#{key}: #{value}"
end

Parameters:

  • prefix (String) (defaults to: "#@prefix%")

    the prefix to match

Yields:

  • (key, value)

    where key is the record's key and value is the record itself



162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 162

def each(prefix: "#@prefix%", &block)
  execute(%{
    SELECT records.key, records.text, records.norm, records.source,
      records.tags, embeddings.embedding
    FROM records
    INNER JOIN embeddings ON records.embedding_id = embeddings.rowid
    WHERE records.key LIKE ?
  }, [ prefix ]).each do |key, text, norm, source, tags, embedding|
    embedding = embedding.unpack("f*")
    tags      = Documentrix::Utils::Tags.new(JSON(tags.to_s).to_a, source:)
    value     = convert_value_to_record(key:, text:, norm:, source:, tags:, embedding:)
    block.(key, value)
  end
  self
end

#find_records(needle, tags: nil, max_records: nil) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>

The find_records method finds records that match the given needle and tags.

Examples:

documents.find_records([ 0.1 ] * 1_024, tags: %w[ test ])

Parameters:

  • needle (Array)

    the embedding vector

  • tags (Array) (defaults to: nil)

    the list of tags to filter by (optional)

  • max_records (Integer) (defaults to: nil)

    the maximum number of records to return (optional)

Yields:

  • (key, value)

Returns:

Raises:

  • (ArgumentError)

    if needle size does not match embedding length



240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 240

def find_records(needle, tags: nil, max_records: nil)
  needle.size != @embedding_length and
    raise ArgumentError, "needle embedding length != %s" % @embedding_length
  needle_binary = needle.pack("f*")
  max_records   = [ max_records, size, 4_096 ].compact.min
  records = find_records_for_tags(tags)
  rowids_where = '(%s)' % records.transpose.last&.join(?,)
  execute(%{
    SELECT records.key, records.text, records.norm, records.source,
      records.tags, embeddings.embedding
    FROM records
    INNER JOIN embeddings ON records.embedding_id = embeddings.rowid
    WHERE embeddings.rowid IN #{rowids_where}
      AND embeddings.embedding MATCH ? AND embeddings.k = ?
  }, [ needle_binary, max_records ]).map do |key, text, norm, source, tags, embedding|
    key       = unpre(key)
    embedding = embedding.unpack("f*")
    tags      = Documentrix::Utils::Tags.new(JSON(tags.to_s).to_a, source:)
    convert_value_to_record(key:, text:, norm:, source:, tags:, embedding:)
  end
end

#find_records_for_tags(tags) ⇒ Array

The find_records_for_tags method filters records based on the provided tags.

Parameters:

  • tags (Array)

    an array of tag names

Returns:

  • (Array)

    an array of filtered records



204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 204

def find_records_for_tags(tags)
  if tags.present?
    tags_filter = Documentrix::Utils::Tags.new(tags).to_a
    unless tags_filter.empty?
      tags_where = ' AND (%s)' % tags_filter.map {
        'tags LIKE "%%%s%%"' % quote(_1)
      }.join(' OR ')
    end
  end
  records = execute(%{
    SELECT key, tags, embedding_id
    FROM records
    WHERE key LIKE ?#{tags_where}
  }, [ "#@prefix%" ])
  if tags_filter
    records = records.select { |key, tags, embedding_id|
      (tags_filter & JSON(tags.to_s).to_a).size >= 1
    }
  end
  records
end

#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache

The full_each method iterates over all keys and values in the cache, regardless of their prefix.

Yields:

  • (key, value)

Returns:



185
186
187
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 185

def full_each(&block)
  each(prefix: ?%, &block)
end

#key?(key) ⇒ FalseClass, TrueClass

The key? method checks if the given key exists in the cache by executing a SQL query.

Parameters:

  • key (String)

    the key to check for existence

Returns:

  • (FalseClass, TrueClass)

    true if the key exists, false otherwise



79
80
81
82
83
84
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 79

def key?(key)
  execute(
    %{ SELECT count(records.key) FROM records WHERE records.key = ? },
    pre(key)
  ).flatten.first == 1
end

#sizeInteger

The size method returns the total number of records stored in the cache, that is the ones with prefix prefix.

Returns:

  • (Integer)

    the count of records



119
120
121
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 119

def size
  execute(%{SELECT COUNT(*) FROM records WHERE key LIKE ?}, [ "#@prefix%" ]).flatten.first
end

#tagsDocumentrix::Utils::Tags

The tags method returns an array of unique tags from the database.

Returns:

  • (Documentrix::Utils::Tags)

    An instance of Documentrix::Utils::Tags containing all unique tags found in the database.



104
105
106
107
108
109
110
111
112
113
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 104

def tags
  result = Documentrix::Utils::Tags.new
  execute(%{
      SELECT DISTINCT(tags) FROM records WHERE key LIKE ?
    }, [ "#@prefix%" ]
  ).flatten.each do
    JSON(_1).each { |t| result.add(t) }
  end
  result
end