Class: Gitgo::Repo

Inherits:
Object
  • Object
show all
Defined in:
lib/gitgo/repo.rb,
lib/gitgo/repo/node.rb,
lib/gitgo/repo/graph.rb

Overview

Repo represents the internal data store used by Gitgo. Repos consist of a Git instance for storing documents in the repository, and an Index instance for queries on the documents. The internal workings of Repo are a bit complex; this document provides terminology and details on how documents and associations are stored. See Index and Graph for how document information is accessed.

Terminology

Gitgo documents are hashes of attributes that can be serialized as JSON. The Gitgo::Document model adds structure to these hashes and enforces data validity, but insofar as Repo is concerned, a document is a serializable hash. Documents are linked into a document graph – a directed acyclic graph (DAG) of document nodes that represent, for example, a chain of comments making a conversation. A given repo can be thought of as storing multiple DAGs, each made up of multiple documents.

The DAGs used by Gitgo are a little weird because they use some nodes to represent revisions and other nodes to represent the ‘current’ nodes in a graph (this setup allows documents to be immutable, and thereby to prevent merge conflicts).

Normally a DAG has this conceptual structure:

head
|
parent
|
node
|
child
|
tail

By contrast, the DAGs used by Gitgo are structured like this:

                        head
                        |
                        parent
                        |
original -> previous -> node -> update -> current version
                        |
                        child
                        |
                        tail

The extra dimension of updates may be unwound to replace all previous versions of a node with the current version(s), so for example:

a                       a
|                       |
b -> b'    becomes      b'
|                       |
c                       c

The full DAG is refered to as the ‘convoluted graph’ and the current DAG is the ‘deconvoluted graph’. The logic performing the deconvolution is encapsulated in Graph and Node.

Parent-child associations are referred to as links, while previous-update associations are referred to as updates. Links and updates are collectively referred to as associations.

There are two additional types of associations; create and delete. Create associations occur when a sha is associated with the empty sha (ie the sha for an empty document). These associations place new documents along a path in the repo when the new document isn’t a child or update. Deletes associate a sha with itself; these act as a break in the DAG such that all subsequent links and updates are omitted.

The first member in an association (parent/previous/sha) is a source and the second (child/update/sha) is a target.

Storage

Documents are stored on a dedicated git branch in a way that prevents merge conflicts and allows merges to directly add nodes anywhere in a document graph. The branch may be checked out and handled like any other git branch, although typically users manage the gitgo branch through Gitgo itself.

Individual documents are stored with their associations along sha-based paths like ‘so/urce/target’ where the source is split into substrings of length 2 and 38. The mode and the relationship of the source-target shas determine the type of association involved. The logic breaks down like this (‘-’ refers to the empty sha, and a/b to different shas):

source   target   mode   type
a        -        644    create
a        b        644    link
a        b        640    update
a        a        644    delete

Using this system, a traveral of the associations is enough to determine how documents are related in a graph without loading documents into memory.

Implementation Note

Repo is organized around an env hash that represents the rack env for a particular request. Objects used by Repo are cached into env for re-use across multiple requests, when possible. The ‘gitgo.*’ constants are used to identify cached objects.

Repo knows how to initialize all the objects it uses. An empty env or a partially filled env may be used to initialize a Repo.

Defined Under Namespace

Classes: Graph, Node

Constant Summary collapse

ENVIRONMENT =
'gitgo.env'
PATH =
'gitgo.path'
OPTIONS =
'gitgo.options'
GIT =
'gitgo.git'
INDEX =
'gitgo.index'
REPO =
'gitgo.repo'
CACHE =
'gitgo.cache'
DOCUMENT_PATH =

Matches a path – ‘ab/xyz/sha’. After the match:

$1:: ab
$2:: xyz
$3:: sha
/^(.{2})\/(.{38})\/(.{40})$/
DEFAULT_MODE =

The default blob mode used for added blobs

'100644'.to_sym
UPDATE_MODE =

The blob mode used to identify updates

'100640'.to_sym
FILE =
'gitgo'

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(env = {}) ⇒ Repo

Initializes a new Repo with the specified env.



185
186
187
# File 'lib/gitgo/repo.rb', line 185

def initialize(env={})
  @env = env
end

Instance Attribute Details

#envObject (readonly)

The repo env, typically the same as a request env.



182
183
184
# File 'lib/gitgo/repo.rb', line 182

def env
  @env
end

Class Method Details

.currentObject

Returns the current Repo, ie env. Initializes and caches a new Repo in env if env is not set.



152
153
154
# File 'lib/gitgo/repo.rb', line 152

def current
  env[REPO] ||= new(env)
end

.envObject

The thread-specific env currently in scope (see set_env). The env stores all the objects used by a Repo and typically represents the rack-env for a specific server request.

Raises an error if no env is in scope.



146
147
148
# File 'lib/gitgo/repo.rb', line 146

def env
  Thread.current[ENVIRONMENT] or raise("no env in scope")
end

.init(path, options = {}) ⇒ Object

Initializes a new Repo to the git repository at the specified path. Options are the same as for Git.init.



119
120
121
122
# File 'lib/gitgo/repo.rb', line 119

def init(path, options={})
  git = Git.init(path, options)
  new(GIT => git)
end

.set_env(env) ⇒ Object

Sets env as the thread-specific env and returns the currently set env.



125
126
127
128
129
# File 'lib/gitgo/repo.rb', line 125

def set_env(env)
  current = Thread.current[ENVIRONMENT]
  Thread.current[ENVIRONMENT] = env
  current
end

.with_env(env) ⇒ Object

Sets env for the block.



132
133
134
135
136
137
138
139
# File 'lib/gitgo/repo.rb', line 132

def with_env(env)
  begin
    current = set_env(env)
    yield
  ensure
    set_env(current)
  end
end

Instance Method Details

#[](sha) ⇒ Object

Returns the cached attrs hash for the specified sha, or nil.



237
238
239
# File 'lib/gitgo/repo.rb', line 237

def [](sha)
  cache[sha]
end

#[]=(sha, attrs) ⇒ Object

Sets the cached attrs for the specified sha.



242
243
244
# File 'lib/gitgo/repo.rb', line 242

def []=(sha, attrs)
  cache[sha] = attrs
end

#assoc_mode(source, target) ⇒ Object

Returns the mode of the specified association.



348
349
350
351
352
353
354
# File 'lib/gitgo/repo.rb', line 348

def assoc_mode(source, target)
  tree = git.tree.subtree(sha_path(source))
  return nil unless tree

  mode, sha = tree[target]
  mode
end

#assoc_sha(source, target) ⇒ Object

Returns the operative sha in an association, ie the source in a head/delete association and the target in a link/update association.



339
340
341
342
343
344
345
# File 'lib/gitgo/repo.rb', line 339

def assoc_sha(source, target)
  case target
  when source    then source
  when empty_sha then source
  else target
  end
end

#assoc_type(source, target, mode = assoc_mode(source, target)) ⇒ Object

Returns the association type given the source, target, and mode.



357
358
359
360
361
362
363
364
365
366
367
368
369
370
# File 'lib/gitgo/repo.rb', line 357

def assoc_type(source, target, mode=assoc_mode(source, target))
  case mode
  when DEFAULT_MODE
    case target
    when empty_sha then :create
    when source    then :delete
    else :link
    end
  when UPDATE_MODE
    :update
  else
    :invalid
  end
end

#associations(source, sort = true) ⇒ Object

Returns a hash of associations for the source, mainly used as a convenience method during testing.



374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
# File 'lib/gitgo/repo.rb', line 374

def associations(source, sort=true)
  associations = {}
  links = []
  updates = []
  
  each_assoc(source) do |sha, type|
    case type
    when :create, :delete
      associations[type] = true
    when :link
      links << sha
    when :update
      updates << sha
    end
  end
  
  unless links.empty?
    
    associations[:links] = links
  end
  
  unless updates.empty?
    updates.sort! if sort
    associations[:updates] = updates
  end
  
  associations
end

#branchObject



193
194
195
# File 'lib/gitgo/repo.rb', line 193

def branch
  git.branch
end

#branch?(sha) ⇒ Boolean

Returns true if the given commit has an empty ‘gitgo’ file in it’s tree.

Returns:

  • (Boolean)


260
261
262
263
264
265
266
267
# File 'lib/gitgo/repo.rb', line 260

def branch?(sha)
  return false if sha.nil?
  return false unless sha = resolve(sha)
  return false unless commit = git.get(:commit, sha)
  
  blob = commit.tree/FILE
  blob && blob.data.empty? ? true : false
end

#cacheObject

Returns or initializes a self-populating cache of attribute hashes in env. Attribute hashes are are keyed by sha.



232
233
234
# File 'lib/gitgo/repo.rb', line 232

def cache
  env[CACHE] ||= Hash.new {|hash, sha| hash[sha] = read(sha) }
end

#checkout(branch) ⇒ Object



589
590
591
592
# File 'lib/gitgo/repo.rb', line 589

def checkout(branch)
  git.checkout(branch)
  self
end

#commit(msg = status) ⇒ Object

Commits any changes to git and writes the index to disk. The commit message is inferred from the status, if left unspecified. Commit will raise an error if there are no changes to commit.



538
539
540
541
542
543
544
# File 'lib/gitgo/repo.rb', line 538

def commit(msg=status)
  setup unless head
  
  sha = git.commit(msg)
  index.write(sha)
  sha
end

#commit!(msg = status) ⇒ Object

Same as commit but does not check if there are changes to commit, useful when you know there are changes to commit and don’t want the overhead of checking for changes.



549
550
551
552
553
554
555
# File 'lib/gitgo/repo.rb', line 549

def commit!(msg=status)
  setup unless head
  
  sha = git.commit!(msg)
  index.write(sha)
  sha
end

#create(sha) ⇒ Object

Creates a create association for the sha:

sh/a/empty_sha (DEFAULT_MODE, sha)


295
296
297
298
# File 'lib/gitgo/repo.rb', line 295

def create(sha)
  git[sha_path(sha, empty_sha)] = [DEFAULT_MODE, sha]
  sha
end

#delete(sha) ⇒ Object

Creates a delete association for the sha:

sh/a/sha (DEFAULT_MODE, empty_sha)


332
333
334
335
# File 'lib/gitgo/repo.rb', line 332

def delete(sha)
  git[sha_path(sha, sha)] = [DEFAULT_MODE, empty_sha]
  self
end

#diff(a, b, type = 'A') ⇒ Object

Returns a list of document shas that have been added (‘A’) between a and

  1. Deleted (‘D’) or modified (‘M’) documents can be specified using

type.



479
480
481
482
483
484
485
486
487
488
489
490
491
492
# File 'lib/gitgo/repo.rb', line 479

def diff(a, b, type='A')
  if a == b || b.nil?
    return []
  end
  
  paths = a.nil? ? git.ls_tree(b) : git.diff_tree(a, b)[type]
  paths.collect! do |path|
    ab, xyz, target = path.split('/', 3)
    assoc_sha("#{ab}#{xyz}", target)
  end

  paths.compact!
  paths
end

#eachObject

Yields the sha of each document in the repo, in no particular order and with duplicates for every link/update that has multiple association sources.



419
420
421
422
423
424
425
426
427
428
429
430
431
432
# File 'lib/gitgo/repo.rb', line 419

def each
  git.tree.each_pair(true) do |ab, xyz_tree|
    next unless ab.length == 2
    
    xyz_tree.each_pair(true) do |xyz, target_tree|
      source = "#{ab}#{xyz}"
      
      target_tree.keys.each do |target|
        doc_sha = assoc_sha(source, target)
        yield(doc_sha) if doc_sha
      end
    end
  end
end

#each_assoc(source) ⇒ Object

Yield each association for source to the block, with the association sha and type. Returns self.



405
406
407
408
409
410
411
412
413
414
# File 'lib/gitgo/repo.rb', line 405

def each_assoc(source) # :yields: sha, type
  return self if source.nil?
  
  target_tree = git.tree.subtree(sha_path(source))
  target_tree.each_pair do |target, (mode, sha)|
    yield assoc_sha(source, target), assoc_type(source, target, mode)
  end if target_tree
  
  self
end

#empty_shaObject

Returns the sha for an empty string, and ensures the corresponding object is set in the repo.



248
249
250
# File 'lib/gitgo/repo.rb', line 248

def empty_sha
  @empty_sha ||= git.set(:blob, '')
end

#gitObject

Returns the Git instance set in env. If no instance is set then one will be initialized using env and env.

Note that given the chain of defaults, git will be initialized to Dir.pwd if the env has no PATH or GIT set.



216
217
218
# File 'lib/gitgo/repo.rb', line 216

def git
  env[GIT] ||= Git.init(path, env[OPTIONS] || {})
end

#graph(sha) ⇒ Object

Initializes a Graph for the sha.



435
436
437
# File 'lib/gitgo/repo.rb', line 435

def graph(sha)
  Graph.new(self, sha)
end

#headObject



189
190
191
# File 'lib/gitgo/repo.rb', line 189

def head
  git.head
end

#indexObject

Returns the Index instance set in env. If no instance is set then one will be initialized under the git working directory, specific to the git branch. For instance:

.git/gitgo/refs/branch/index


226
227
228
# File 'lib/gitgo/repo.rb', line 226

def index
  env[INDEX] ||= Index.new(File.join(git.work_dir, 'refs', git.branch, 'index'))
end

Creates a link association for parent and child:

pa/rent/child (DEFAULT_MODE, child)


314
315
316
317
# File 'lib/gitgo/repo.rb', line 314

def link(parent, child)
  git[sha_path(parent, child)] = [DEFAULT_MODE, child]
  self
end

#pathObject

Returns the path to git repository. Path is determined from env, or inferred and set in env from env. The default path is Dir.pwd.



207
208
209
# File 'lib/gitgo/repo.rb', line 207

def path
  env[PATH] ||= (env.has_key?(GIT) ? env[GIT].path : Dir.pwd)
end

#read(sha) ⇒ Object

Reads and deserializes the specified hash of attrs. If sha does not indicate a blob that deserializes as JSON then read returns nil.



302
303
304
305
306
307
308
# File 'lib/gitgo/repo.rb', line 302

def read(sha)
  begin
    JSON.parse(git.get(:blob, sha).data)
  rescue JSON::ParserError, Errno::EISDIR 
    nil
  end
end

#refsObject

Returns an array of refs representing gitgo branches.



270
271
272
# File 'lib/gitgo/repo.rb', line 270

def refs
  select_branches(git.grit.refs)
end

#remotesObject

Returns an array of remotes representing gitgo branches.



275
276
277
# File 'lib/gitgo/repo.rb', line 275

def remotes
  select_branches(git.grit.remotes)
end

#reset(full = false) ⇒ Object



594
595
596
597
598
599
# File 'lib/gitgo/repo.rb', line 594

def reset(full=false)
  git.reset(full)
  cache.clear
  index.reset
  self
end

#resolve(sha) ⇒ Object



201
202
203
# File 'lib/gitgo/repo.rb', line 201

def resolve(sha)
  git.resolve(sha) rescue sha
end

#rev_list(sha) ⇒ Object

Returns an array of revisions (commits) reachable from the sha. These revisions are cached for quick retreival.



467
468
469
470
471
472
473
474
# File 'lib/gitgo/repo.rb', line 467

def rev_list(sha)
  sha = sha.to_sym
  unless cache.has_key?(sha)
    cache[sha] = git.rev_list(sha.to_s)
  end
  
  cache[sha]
end

#save(attrs) ⇒ Object

Serializes and sets the attributes as a blob in the git repo and caches the attributes by the blob sha. Returns the blob sha.

Note that save does not put the blob along a path in the repo; immediately after save the blob is hanging and will be gc’ed by git unless set into a path by create, link, or update.



285
286
287
288
289
# File 'lib/gitgo/repo.rb', line 285

def save(attrs)
  sha = git.set(:blob, JSON.generate(attrs))
  cache[sha] = attrs
  sha
end

#scopeObject

Sets self as the current Repo for the duration of the block.



602
603
604
# File 'lib/gitgo/repo.rb', line 602

def scope
  Repo.with_env(REPO => self) { yield }
end

#setup(upstream_branch = nil) ⇒ Object



557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
# File 'lib/gitgo/repo.rb', line 557

def setup(upstream_branch=nil)
  if head
    raise "already setup on: #{branch} (#{head})"
  end
  
  if upstream_branch.nil? || upstream_branch.empty?
    tree = Git::Tree.new
    tree[FILE] = [git.default_blob_mode, empty_sha]
    mode, sha = tree.write_to(git)
    git.commit!("setup gitgo", :tree => sha)
    
    current_tree = git.tree
    git.reset
    git.tree.merge!(current_tree)
    
    return self
  end
  
  unless branch?(upstream_branch)
    raise "not a gitgo branch: #{upstream_branch.inspect}"
  end
  
  if git.tracking_branch?(upstream_branch)
    git.track(upstream_branch)
  end
  git.merge(upstream_branch)
  
  cache.clear
  index.reset
  self
end

#sha_path(sha, *paths) ⇒ Object

Creates a nested sha path like: ab/xyz/paths



253
254
255
256
257
# File 'lib/gitgo/repo.rb', line 253

def sha_path(sha, *paths)
  paths.unshift sha[2,38]
  paths.unshift sha[0,2]
  paths
end

#statusObject

Generates a status message based on currently uncommitted changes.



495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
# File 'lib/gitgo/repo.rb', line 495

def status
  unless block_given?
    return status {|sha| sha}
  end
  
  lines = []
  git.status.each_pair do |path, state|
    ab, xyz, target = path.split('/', 3)
    source = "#{ab}#{xyz}"
    
    sha  = assoc_sha(source, target)
    type = assoc_type(source, target)
    
    status = case assoc_type(source, target)
    when :create
      type = self[sha]['type']
      [type || 'doc', yield(sha)]
    when :link
      ['link', "#{yield(source)} to  #{yield(target)}"]
    when :update
      ['update', "#{yield(target)} was #{yield(source)}"]
    when :delete
      ['delete', yield(sha)]
    else
      ['unknown', path]
    end
    
    if status
      status.unshift state_str(state)
      lines << status
    end
  end
  
  indent = lines.collect {|(state, type, msg)| type.length }.max
  format = "%s %-#{indent}s %s"
  lines.collect! {|ary| format % ary }
  lines.sort!
  lines.join("\n")
end

#timeline(options = {}) ⇒ Object

Returns an array of shas representing recent documents added.



440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
# File 'lib/gitgo/repo.rb', line 440

def timeline(options={})
  options = {:n => 10, :offset => 0}.merge(options)
  offset = options[:offset]
  n = options[:n]

  shas = []
  return shas if n <= 0
  
  dates = index.values('date').sort.reverse
  index.each_sha('date', dates) do |sha|
    if block_given?
      next unless yield(sha)
    end
    
    if offset > 0
      offset -= 1
    else
      shas << sha
      break if n && shas.length == n
    end
  end
  
  shas
end

#update(old_sha, new_sha) ⇒ Object

Creates an update association for old and new shas:

ol/d_sha/new_sha (UPDATE_MODE, new_sha)


323
324
325
326
# File 'lib/gitgo/repo.rb', line 323

def update(old_sha, new_sha)
  git[sha_path(old_sha, new_sha)] = [UPDATE_MODE, new_sha]
  self
end

#upstream_branchObject



197
198
199
# File 'lib/gitgo/repo.rb', line 197

def upstream_branch
  git.upstream_branch
end