Class: Documentrix::Documents::Cache::SQLiteCache
- Inherits:
-
Object
- Object
- Documentrix::Documents::Cache::SQLiteCache
- Includes:
- Common, Enumerable
- Defined in:
- lib/documentrix/documents/cache/sqlite_cache.rb
Instance Attribute Summary collapse
-
#embedding_length ⇒ Object
readonly
length of the embeddings vector.
-
#filename ⇒ Object
readonly
filename for the database,
:memory:
is in memory.
Attributes included from Common
Instance Method Summary collapse
- #[](key) ⇒ Documentrix::Documents::Record, NilClass
-
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
-
#clear ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear method deletes all records for prefix
prefix
from the cache by executing a SQL query. -
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting records that match those tags and have the prefix
prefix
. -
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
-
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
-
#each(prefix: "#@prefix%") {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
-
#find_records(needle, tags: nil, max_records: nil) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
-
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
-
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
-
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) ⇒ void
constructor
The initialize method sets up the cache by calling super and setting various instance variables.
-
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
-
#size ⇒ Integer
The size method returns the total number of records stored in the cache, that is the ones with prefix
prefix
. -
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
Methods included from Common
Methods included from Utils::Math
Constructor Details
#initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) ⇒ void
The initialize method sets up the cache by calling super and setting various instance variables.
18 19 20 21 22 23 24 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 18 def initialize(prefix:, embedding_length: 1_024, filename: ':memory:', debug: false) super(prefix:) @embedding_length = @filename = filename @debug = debug setup_database(filename) end |
Instance Attribute Details
#embedding_length ⇒ Object (readonly)
length of the embeddings vector
28 29 30 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 28 def @embedding_length end |
#filename ⇒ Object (readonly)
filename for the database, :memory:
is in memory
26 27 28 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 26 def filename @filename end |
Instance Method Details
#[](key) ⇒ Documentrix::Documents::Record, NilClass
37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 37 def [](key) result = execute( %{ SELECT records.key, records.text, records.norm, records.source, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key = ? }, pre(key) )&.first or return key, text, norm, source, , = *result = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, tags:, embedding:) end |
#[]=(key, value) ⇒ Object
The []= method sets the value for a given key by inserting it into the database.
60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 60 def []=(key, value) value = convert_value_to_record(value) = value..pack("f*") execute(%{BEGIN}) execute(%{INSERT INTO embeddings(embedding) VALUES(?)}, [ ]) , = execute(%{ SELECT last_insert_rowid() }).flatten execute(%{ INSERT INTO records(key,text,embedding_id,norm,source,tags) VALUES(?,?,?,?,?,?) }, [ pre(key), value.text, , value.norm, value.source, JSON(value.) ]) execute(%{COMMIT}) end |
#clear ⇒ Documentrix::Documents::RedisBackedMemoryCache
The clear method deletes all records for prefix prefix
from the cache by
executing a SQL query.
147 148 149 150 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 147 def clear execute(%{DELETE FROM records WHERE key LIKE ?}, [ "#@prefix%" ]) self end |
#clear_for_tags(tags = nil) ⇒ Documentrix::Documents::Cache::SQLiteCache
The clear_for_tags method clears the cache for specific tags by deleting
records that match those tags and have the prefix prefix
.
131 132 133 134 135 136 137 138 139 140 141 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 131 def ( = nil) = Documentrix::Utils::Tags.new().to_a if .present? records = () keys = '(%s)' % records.transpose.first.map { "'%s'" % quote(_1) }.join(?,) execute(%{DELETE FROM records WHERE key IN #{keys}}) else clear end self end |
#convert_to_vector(vector) ⇒ Array
The convert_to_vector method returns the input vector itself, because conversion isn't necessary for this cache class.
195 196 197 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 195 def convert_to_vector(vector) vector end |
#delete(key) ⇒ NilClass
The delete method removes a key from the cache by executing a SQL query.
91 92 93 94 95 96 97 98 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 91 def delete(key) result = key?(key) execute( %{ DELETE FROM records WHERE records.key = ? }, pre(key) ) result end |
#each(prefix: "#@prefix%") {|key, value| ... } ⇒ Object
The each method iterates over records matching the given prefix and yields them to the block.
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 162 def each(prefix: "#@prefix%", &block) execute(%{ SELECT records.key, records.text, records.norm, records.source, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE records.key LIKE ? }, [ prefix ]).each do |key, text, norm, source, , | = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) value = convert_value_to_record(key:, text:, norm:, source:, tags:, embedding:) block.(key, value) end self end |
#find_records(needle, tags: nil, max_records: nil) {|key, value| ... } ⇒ Array<Documentrix::Documents::Record>
The find_records method finds records that match the given needle and tags.
240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 240 def find_records(needle, tags: nil, max_records: nil) needle.size != @embedding_length and raise ArgumentError, "needle embedding length != %s" % @embedding_length needle_binary = needle.pack("f*") max_records = [ max_records, size, 4_096 ].compact.min records = () rowids_where = '(%s)' % records.transpose.last&.join(?,) execute(%{ SELECT records.key, records.text, records.norm, records.source, records.tags, embeddings.embedding FROM records INNER JOIN embeddings ON records.embedding_id = embeddings.rowid WHERE embeddings.rowid IN #{rowids_where} AND embeddings.embedding MATCH ? AND embeddings.k = ? }, [ needle_binary, max_records ]).map do |key, text, norm, source, , | key = unpre(key) = .unpack("f*") = Documentrix::Utils::Tags.new(JSON(.to_s).to_a, source:) convert_value_to_record(key:, text:, norm:, source:, tags:, embedding:) end end |
#find_records_for_tags(tags) ⇒ Array
The find_records_for_tags method filters records based on the provided tags.
204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 204 def () if .present? = Documentrix::Utils::Tags.new().to_a unless .empty? = ' AND (%s)' % .map { 'tags LIKE "%%%s%%"' % quote(_1) }.join(' OR ') end end records = execute(%{ SELECT key, tags, embedding_id FROM records WHERE key LIKE ?#{} }, [ "#@prefix%" ]) if records = records.select { |key, , | ( & JSON(.to_s).to_a).size >= 1 } end records end |
#full_each {|key, value| ... } ⇒ Documentrix::Documents::Cache::SQLiteCache
The full_each method iterates over all keys and values in the cache, regardless of their prefix.
185 186 187 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 185 def full_each(&block) each(prefix: ?%, &block) end |
#key?(key) ⇒ FalseClass, TrueClass
The key? method checks if the given key exists in the cache by executing a SQL query.
79 80 81 82 83 84 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 79 def key?(key) execute( %{ SELECT count(records.key) FROM records WHERE records.key = ? }, pre(key) ).flatten.first == 1 end |
#size ⇒ Integer
The size method returns the total number of records stored in the cache,
that is the ones with prefix prefix
.
119 120 121 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 119 def size execute(%{SELECT COUNT(*) FROM records WHERE key LIKE ?}, [ "#@prefix%" ]).flatten.first end |
#tags ⇒ Documentrix::Utils::Tags
The tags method returns an array of unique tags from the database.
104 105 106 107 108 109 110 111 112 113 |
# File 'lib/documentrix/documents/cache/sqlite_cache.rb', line 104 def result = Documentrix::Utils::Tags.new execute(%{ SELECT DISTINCT(tags) FROM records WHERE key LIKE ? }, [ "#@prefix%" ] ).flatten.each do JSON(_1).each { |t| result.add(t) } end result end |