Class: Gitgo::Index

Inherits:
Object
  • Object
show all
Defined in:
lib/gitgo/index.rb,
lib/gitgo/index/idx_file.rb,
lib/gitgo/index/sha_file.rb

Overview

Index provides an index of documents used to expedite searches. Index structures it’s data into a branch-specific directory structure:

.git/gitgo/refs/[branch]/index
|- filter
|  `- key
|     `- value
|
|- head
|- list
`- graph_map

The files contain the following data (in conceptual order):

head      The user commit sha at which the last reindex occured.
list      A list of H* packed shas representing all of the documents
          accessible by the gitgo branch. The index (idx) of the sha in
          list serves as an identifier for the sha in map and filters.
map       A list of L* packed idx pairs mapping a document to its graph
          head.
[value]   A list of L* packed idx that match the key-value pair. These
          lists act as filters in searches.

The packing format for each of the index files was chosen for performance; both to minimize the footprint of the file and optimize the usage of the file data. – Index also maintains a cache of temporary files that auto-expire after a certain period of time. The temporary files contain H* packed shas and represent the results of various queries, such as rev-lists. ++

Usage

Index files are used primarily to select documents based on various filters. For example, to select the shas for all comments tagged as ‘important’ you would do this:

index = Index.new('path')

comment   = index['type']['comment']
important = index['tag']['important']
selected  = comment & important

heads = selected.collect {|id| idx.map[id] }
shas  = heads.collect {|id| idx.list[id] }.uniq

The array operations are very quick because the filters are composed of integers, as is the map. The final step resolves the integers to shas, but this too is simply an array lookup. The select method encapsulates this logic:

shas = index.select(
  :all => {'type' => 'comment', 'tag' => 'important'},
  :map => true,
  :shas => true
)

Importantly, any of the index files can contain duplication without affecting the results of select; this allows new documents to be quickly added into a filter or appended to list/map. As needed or convenient, the index can compact itself and remove duplication.

Defined Under Namespace

Classes: IdxFile, ShaFile

Constant Summary collapse

HEAD =

A file containing the ref at which the last index was performed; used to determine when a reindex is required relative to some other ref.

'head'
MAP =

An idx file mapping shas to their graph heads

'map'
LIST =

A sha file listing indexed shas; the index of the sha is used as an identifer for the sha in all idx files.

'list'
FILTER =

The filter directory

'filter'

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(path) ⇒ Index

Returns a new instance of Index.



97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
# File 'lib/gitgo/index.rb', line 97

def initialize(path)
  @path = path
  @head_file = File.expand_path(HEAD, path)
  @map_file = File.expand_path(MAP, path)
  @list_file = File.expand_path(LIST, path)
  
  @cache = Hash.new do |key_hash, key|
    key_hash[key] = Hash.new do |value_hash, value|
      value_hash[value] = begin
        index = self.path(FILTER, key, value)
        File.file?(index) ? IdxFile.read(index) : []
      end
    end
  end
end

Instance Attribute Details

#cacheObject (readonly)

Returns an in-memory, self-filling cache of idx files



95
96
97
# File 'lib/gitgo/index.rb', line 95

def cache
  @cache
end

#head_fileObject (readonly)

The head file for self



86
87
88
# File 'lib/gitgo/index.rb', line 86

def head_file
  @head_file
end

#list_fileObject (readonly)

The list file for self



92
93
94
# File 'lib/gitgo/index.rb', line 92

def list_file
  @list_file
end

#map_fileObject (readonly)

The map file for self



89
90
91
# File 'lib/gitgo/index.rb', line 89

def map_file
  @map_file
end

Instance Method Details

#[](key) ⇒ Object

Returns cache, a self-filling hash of filter values. Be careful not to modify index[v] as it is the actual cache storage.



138
139
140
# File 'lib/gitgo/index.rb', line 138

def [](key)
  cache[key]
end

#all(*keys) ⇒ Object



226
227
228
229
230
231
232
233
234
235
# File 'lib/gitgo/index.rb', line 226

def all(*keys)
  results = []
  keys.collect do |key|
    values(key).each do |value|
      results.concat(cache[key][value])
    end
  end
  results.uniq!
  results
end

#assoc(source, target, type) ⇒ Object



162
163
164
165
166
167
168
169
# File 'lib/gitgo/index.rb', line 162

def assoc(source, target, type)
  source_idx = idx(source)
  target_idx = idx(target)
  map[target_idx] = source_idx
  tail_filter << source_idx unless type == :create
  delete_filter << source_idx if type == :delete
  self
end

#clearObject

Clears all index files, and the cache.



345
346
347
348
349
350
# File 'lib/gitgo/index.rb', line 345

def clear
  if File.exists?(path)
    FileUtils.rm_r(path)
  end
  reset
end

#compactObject



290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
# File 'lib/gitgo/index.rb', line 290

def compact
  # reindex shas in list, and create an idx map for updating idxs
  old_list = {}
  list.each {|sha| old_list[old_list.length] = sha }
  
  list.uniq!
  
  new_list = {}
  list.each {|sha| new_list[sha] = new_list.length}
  
  idx_map = {}
  old_list.each_pair {|idx, sha| idx_map[idx] = new_list[sha]}
  
  # update/deconvolute mapped idx values
  new_map = {}
  map.each_pair {|idx, head_idx| new_map[idx_map[idx]] = idx_map[head_idx] }
  new_map.keys.each {|idx| new_map[idx] = deconvolute(idx, new_map) || idx }
  
  @map = new_map
  
  # update filter values
  @cache.values.each do |value_hash|
    value_hash.values.each do |idxs|
      idxs.collect! {|idx| idx_map[idx] }.uniq!
    end
  end
  
  self
end

#create(source) ⇒ Object



171
172
173
174
175
176
177
178
179
180
181
# File 'lib/gitgo/index.rb', line 171

def create(source)
  source_idx = idx(source)

  head_idx = graph_head_idx(source_idx)
  unless head_idx.nil? || head_idx == source_idx
    raise "create graph fail: #{source} (already associated with graph #{list[head_idx]})"
  end

  map[source_idx] = source_idx
  self
end

#delete(source) ⇒ Object



191
192
193
194
195
196
# File 'lib/gitgo/index.rb', line 191

def delete(source)
  source_idx = idx(source)
  tail_filter << source_idx
  delete_filter << source_idx
  self
end

#each_idx(key, values) ⇒ Object



237
238
239
240
241
242
243
244
245
246
247
# File 'lib/gitgo/index.rb', line 237

def each_idx(key, values)
  unless values.respond_to?(:each)
    values = [values]
  end
  
  values.each do |value|
    cache[key][value].each do |idx|
      yield idx
    end
  end
end

#each_sha(key, values, filter = delete_filter) ⇒ Object



249
250
251
252
253
254
# File 'lib/gitgo/index.rb', line 249

def each_sha(key, values, filter=delete_filter)
  each_idx(key, values) do |idx|
    next if filter.include?(idx)
    yield list[idx]
  end
end

#graph_head_idx(idx) ⇒ Object

Return the graph head idx for the specified idx.



158
159
160
# File 'lib/gitgo/index.rb', line 158

def graph_head_idx(idx)
  deconvolute(idx, map)
end

#headObject

Returns the sha in the head_file, if it exists, and nil otherwise.



114
115
116
# File 'lib/gitgo/index.rb', line 114

def head
  File.exists?(head_file) ? File.open(head_file) {|io| io.read(40) } : nil
end

#idx(sha) ⇒ Object

Returns the idx for sha, as specified in list. If the sha is not in list then it is appended to list.



144
145
146
147
148
149
150
151
152
153
154
155
# File 'lib/gitgo/index.rb', line 144

def idx(sha)
  case
  when list[-1] == sha
    list.length - 1
  when idx = list.index(sha)
    idx
  else
    idx = list.length
    list << sha
    idx
  end
end

#join(key, *values) ⇒ Object



256
257
258
# File 'lib/gitgo/index.rb', line 256

def join(key, *values)
  values.collect {|value| cache[key][value] }.flatten
end

#keysObject

Returns a list of possible index keys.



199
200
201
202
203
204
205
206
207
208
209
210
# File 'lib/gitgo/index.rb', line 199

def keys
  keys = cache.keys
  
  Dir.glob(self.path(FILTER, '*')).select do |path|
    File.directory?(path)
  end.each do |path|
    keys << File.basename(path)
  end
  
  keys.uniq!
  keys
end


183
184
185
# File 'lib/gitgo/index.rb', line 183

def link(source, target)
  associate :link, source, target
end

#listObject

Returns the contents of list_file, as an array.



127
128
129
# File 'lib/gitgo/index.rb', line 127

def list
  @list ||= (File.exists?(list_file) ? ShaFile.read(list_file) : [])
end

#mapObject

Returns the contents of map_file, as a hash.



119
120
121
122
123
124
# File 'lib/gitgo/index.rb', line 119

def map
  @map ||= begin
    entries = File.exists?(map_file) ? IdxFile.read(map_file) : []
    Hash[*entries]
  end
end

#path(*segments) ⇒ Object

Returns the segments joined to the path used to initialize self.



132
133
134
# File 'lib/gitgo/index.rb', line 132

def path(*segments)
  File.join(@path, *segments)
end

#resetObject

Clears the cache.



337
338
339
340
341
342
# File 'lib/gitgo/index.rb', line 337

def reset
  @list = nil
  @map = nil
  @cache.clear
  self
end

#select(options = {}) ⇒ Object



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
# File 'lib/gitgo/index.rb', line 260

def select(options={})
  basis = options[:basis] || (0..list.length).to_a
  
  if all = options[:all]
    each_pair(all) do |key, value|
      basis = basis & cache[key][value]
      break if basis.empty?
    end
  end
  
  if any = options[:any]
    matches = []
    each_pair(any) do |key, value|
      matches.concat cache[key][value]
    end
    basis = basis & matches
  end
  
  if options[:map]
    basis.collect! {|idx| map[idx] }
  end
  
  if options[:shas]
    basis.collect! {|idx| list[idx] }
  end
  
  basis.uniq!
  basis
end

#update(source, target) ⇒ Object



187
188
189
# File 'lib/gitgo/index.rb', line 187

def update(source, target)
  associate :update, source, target
end

#values(key) ⇒ Object

Returns a list of possible values for the specified index key.



213
214
215
216
217
218
219
220
221
222
223
224
# File 'lib/gitgo/index.rb', line 213

def values(key)
  values = cache[key].keys
  
  base  = path(FILTER, key)
  start = base.length + 1
  Dir.glob("#{base}/**/*").each do |value_path|
    values << value_path[start, value_path.length-start]
  end
  
  values.uniq!
  values
end

#write(sha = nil) ⇒ Object

Writes cached changes.



321
322
323
324
325
326
327
328
329
330
331
332
333
334
# File 'lib/gitgo/index.rb', line 321

def write(sha=nil)
  @cache.each_pair do |key, value_hash|
    value_hash.each_pair do |value, idx|
      IdxFile.write(path(FILTER, key, value), idx)
    end
  end
  
  FileUtils.mkdir_p(path) unless File.exists?(path)
  File.open(head_file, "w") {|io| io.write(sha) } if sha
  ShaFile.write(list_file, list.join)
  IdxFile.write(map_file, map.to_a.flatten)
  
  self
end