Class: Gitgo::Repo
- Inherits:
-
Object
- Object
- Gitgo::Repo
- Defined in:
- lib/gitgo/repo.rb,
lib/gitgo/repo/node.rb,
lib/gitgo/repo/graph.rb
Overview
Repo represents the internal data store used by Gitgo. Repos consist of a Git instance for storing documents in the repository, and an Index instance for queries on the documents. The internal workings of Repo are a bit complex; this document provides terminology and details on how documents and associations are stored. See Index and Graph for how document information is accessed.
Terminology
Gitgo documents are hashes of attributes that can be serialized as JSON. The Gitgo::Document model adds structure to these hashes and enforces data validity, but insofar as Repo is concerned, a document is a serializable hash. Documents are linked into a document graph – a directed acyclic graph (DAG) of document nodes that represent, for example, a chain of comments making a conversation. A given repo can be thought of as storing multiple DAGs, each made up of multiple documents.
The DAGs used by Gitgo are a little weird because they use some nodes to represent revisions and other nodes to represent the ‘current’ nodes in a graph (this setup allows documents to be immutable, and thereby to prevent merge conflicts).
Normally a DAG has this conceptual structure:
head
|
parent
|
node
|
child
|
tail
By contrast, the DAGs used by Gitgo are structured like this:
head
|
parent
|
original -> previous -> node -> update -> current version
|
child
|
tail
The extra dimension of updates may be unwound to replace all previous versions of a node with the current version(s), so for example:
a a
| |
b -> b' becomes b'
| |
c c
The full DAG is refered to as the ‘convoluted graph’ and the current DAG is the ‘deconvoluted graph’. The logic performing the deconvolution is encapsulated in Graph and Node.
Parent-child associations are referred to as links, while previous-update associations are referred to as updates. Links and updates are collectively referred to as associations.
There are two additional types of associations; create and delete. Create associations occur when a sha is associated with the empty sha (ie the sha for an empty document). These associations place new documents along a path in the repo when the new document isn’t a child or update. Deletes associate a sha with itself; these act as a break in the DAG such that all subsequent links and updates are omitted.
The first member in an association (parent/previous/sha) is a source and the second (child/update/sha) is a target.
Storage
Documents are stored on a dedicated git branch in a way that prevents merge conflicts and allows merges to directly add nodes anywhere in a document graph. The branch may be checked out and handled like any other git branch, although typically users manage the gitgo branch through Gitgo itself.
Individual documents are stored with their associations along sha-based paths like ‘so/urce/target’ where the source is split into substrings of length 2 and 38. The mode and the relationship of the source-target shas determine the type of association involved. The logic breaks down like this (‘-’ refers to the empty sha, and a/b to different shas):
source target mode type
a - 644 create
a b 644 link
a b 640 update
a a 644 delete
Using this system, a traveral of the associations is enough to determine how documents are related in a graph without loading documents into memory.
Implementation Note
Repo is organized around an env hash that represents the rack env for a particular request. Objects used by Repo are cached into env for re-use across multiple requests, when possible. The ‘gitgo.*’ constants are used to identify cached objects.
Repo knows how to initialize all the objects it uses. An empty env or a partially filled env may be used to initialize a Repo.
Defined Under Namespace
Constant Summary collapse
- ENVIRONMENT =
'gitgo.env'
- PATH =
'gitgo.path'
- OPTIONS =
'gitgo.options'
- GIT =
'gitgo.git'
- INDEX =
'gitgo.index'
- REPO =
'gitgo.repo'
- CACHE =
'gitgo.cache'
- DOCUMENT_PATH =
Matches a path – ‘ab/xyz/sha’. After the match:
$1:: ab $2:: xyz $3:: sha
/^(.{2})\/(.{38})\/(.{40})$/
- DEFAULT_MODE =
The default blob mode used for added blobs
'100644'.to_sym
- UPDATE_MODE =
The blob mode used to identify updates
'100640'.to_sym
- FILE =
'gitgo'
Instance Attribute Summary collapse
-
#env ⇒ Object
readonly
The repo env, typically the same as a request env.
Class Method Summary collapse
-
.current ⇒ Object
Returns the current Repo, ie env.
-
.env ⇒ Object
The thread-specific env currently in scope (see set_env).
-
.init(path, options = {}) ⇒ Object
Initializes a new Repo to the git repository at the specified path.
-
.set_env(env) ⇒ Object
Sets env as the thread-specific env and returns the currently set env.
-
.with_env(env) ⇒ Object
Sets env for the block.
Instance Method Summary collapse
-
#[](sha) ⇒ Object
Returns the cached attrs hash for the specified sha, or nil.
-
#[]=(sha, attrs) ⇒ Object
Sets the cached attrs for the specified sha.
-
#assoc_mode(source, target) ⇒ Object
Returns the mode of the specified association.
-
#assoc_sha(source, target) ⇒ Object
Returns the operative sha in an association, ie the source in a head/delete association and the target in a link/update association.
-
#assoc_type(source, target, mode = assoc_mode(source, target)) ⇒ Object
Returns the association type given the source, target, and mode.
-
#associations(source, sort = true) ⇒ Object
Returns a hash of associations for the source, mainly used as a convenience method during testing.
- #branch ⇒ Object
-
#branch?(sha) ⇒ Boolean
Returns true if the given commit has an empty ‘gitgo’ file in it’s tree.
-
#cache ⇒ Object
Returns or initializes a self-populating cache of attribute hashes in env.
- #checkout(branch) ⇒ Object
-
#commit(msg = status) ⇒ Object
Commits any changes to git and writes the index to disk.
-
#commit!(msg = status) ⇒ Object
Same as commit but does not check if there are changes to commit, useful when you know there are changes to commit and don’t want the overhead of checking for changes.
-
#create(sha) ⇒ Object
Creates a create association for the sha:.
-
#delete(sha) ⇒ Object
Creates a delete association for the sha:.
-
#diff(a, b, type = 'A') ⇒ Object
Returns a list of document shas that have been added (‘A’) between a and b.
-
#each ⇒ Object
Yields the sha of each document in the repo, in no particular order and with duplicates for every link/update that has multiple association sources.
-
#each_assoc(source) ⇒ Object
Yield each association for source to the block, with the association sha and type.
-
#empty_sha ⇒ Object
Returns the sha for an empty string, and ensures the corresponding object is set in the repo.
-
#git ⇒ Object
Returns the Git instance set in env.
-
#graph(sha) ⇒ Object
Initializes a Graph for the sha.
- #head ⇒ Object
-
#index ⇒ Object
Returns the Index instance set in env.
-
#initialize(env = {}) ⇒ Repo
constructor
Initializes a new Repo with the specified env.
-
#link(parent, child) ⇒ Object
Creates a link association for parent and child:.
-
#path ⇒ Object
Returns the path to git repository.
-
#read(sha) ⇒ Object
Reads and deserializes the specified hash of attrs.
-
#refs ⇒ Object
Returns an array of refs representing gitgo branches.
-
#remotes ⇒ Object
Returns an array of remotes representing gitgo branches.
- #reset(full = false) ⇒ Object
- #resolve(sha) ⇒ Object
-
#rev_list(sha) ⇒ Object
Returns an array of revisions (commits) reachable from the sha.
-
#save(attrs) ⇒ Object
Serializes and sets the attributes as a blob in the git repo and caches the attributes by the blob sha.
-
#scope ⇒ Object
Sets self as the current Repo for the duration of the block.
- #setup(upstream_branch = nil) ⇒ Object
-
#sha_path(sha, *paths) ⇒ Object
Creates a nested sha path like: ab/xyz/paths.
-
#status ⇒ Object
Generates a status message based on currently uncommitted changes.
-
#timeline(options = {}) ⇒ Object
Returns an array of shas representing recent documents added.
-
#update(old_sha, new_sha) ⇒ Object
Creates an update association for old and new shas:.
- #upstream_branch ⇒ Object
Constructor Details
#initialize(env = {}) ⇒ Repo
Initializes a new Repo with the specified env.
185 186 187 |
# File 'lib/gitgo/repo.rb', line 185 def initialize(env={}) @env = env end |
Instance Attribute Details
#env ⇒ Object (readonly)
The repo env, typically the same as a request env.
182 183 184 |
# File 'lib/gitgo/repo.rb', line 182 def env @env end |
Class Method Details
.current ⇒ Object
152 153 154 |
# File 'lib/gitgo/repo.rb', line 152 def current env[REPO] ||= new(env) end |
.env ⇒ Object
The thread-specific env currently in scope (see set_env). The env stores all the objects used by a Repo and typically represents the rack-env for a specific server request.
Raises an error if no env is in scope.
146 147 148 |
# File 'lib/gitgo/repo.rb', line 146 def env Thread.current[ENVIRONMENT] or raise("no env in scope") end |
.init(path, options = {}) ⇒ Object
Initializes a new Repo to the git repository at the specified path. Options are the same as for Git.init.
119 120 121 122 |
# File 'lib/gitgo/repo.rb', line 119 def init(path, ={}) git = Git.init(path, ) new(GIT => git) end |
.set_env(env) ⇒ Object
Sets env as the thread-specific env and returns the currently set env.
125 126 127 128 129 |
# File 'lib/gitgo/repo.rb', line 125 def set_env(env) current = Thread.current[ENVIRONMENT] Thread.current[ENVIRONMENT] = env current end |
.with_env(env) ⇒ Object
Sets env for the block.
132 133 134 135 136 137 138 139 |
# File 'lib/gitgo/repo.rb', line 132 def with_env(env) begin current = set_env(env) yield ensure set_env(current) end end |
Instance Method Details
#[](sha) ⇒ Object
Returns the cached attrs hash for the specified sha, or nil.
237 238 239 |
# File 'lib/gitgo/repo.rb', line 237 def [](sha) cache[sha] end |
#[]=(sha, attrs) ⇒ Object
Sets the cached attrs for the specified sha.
242 243 244 |
# File 'lib/gitgo/repo.rb', line 242 def []=(sha, attrs) cache[sha] = attrs end |
#assoc_mode(source, target) ⇒ Object
Returns the mode of the specified association.
348 349 350 351 352 353 354 |
# File 'lib/gitgo/repo.rb', line 348 def assoc_mode(source, target) tree = git.tree.subtree(sha_path(source)) return nil unless tree mode, sha = tree[target] mode end |
#assoc_sha(source, target) ⇒ Object
Returns the operative sha in an association, ie the source in a head/delete association and the target in a link/update association.
339 340 341 342 343 344 345 |
# File 'lib/gitgo/repo.rb', line 339 def assoc_sha(source, target) case target when source then source when empty_sha then source else target end end |
#assoc_type(source, target, mode = assoc_mode(source, target)) ⇒ Object
Returns the association type given the source, target, and mode.
357 358 359 360 361 362 363 364 365 366 367 368 369 370 |
# File 'lib/gitgo/repo.rb', line 357 def assoc_type(source, target, mode=assoc_mode(source, target)) case mode when DEFAULT_MODE case target when empty_sha then :create when source then :delete else :link end when UPDATE_MODE :update else :invalid end end |
#associations(source, sort = true) ⇒ Object
Returns a hash of associations for the source, mainly used as a convenience method during testing.
374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 |
# File 'lib/gitgo/repo.rb', line 374 def associations(source, sort=true) associations = {} links = [] updates = [] each_assoc(source) do |sha, type| case type when :create, :delete associations[type] = true when :link links << sha when :update updates << sha end end unless links.empty? associations[:links] = links end unless updates.empty? updates.sort! if sort associations[:updates] = updates end associations end |
#branch ⇒ Object
193 194 195 |
# File 'lib/gitgo/repo.rb', line 193 def branch git.branch end |
#branch?(sha) ⇒ Boolean
Returns true if the given commit has an empty ‘gitgo’ file in it’s tree.
260 261 262 263 264 265 266 267 |
# File 'lib/gitgo/repo.rb', line 260 def branch?(sha) return false if sha.nil? return false unless sha = resolve(sha) return false unless commit = git.get(:commit, sha) blob = commit.tree/FILE blob && blob.data.empty? ? true : false end |
#cache ⇒ Object
Returns or initializes a self-populating cache of attribute hashes in env. Attribute hashes are are keyed by sha.
232 233 234 |
# File 'lib/gitgo/repo.rb', line 232 def cache env[CACHE] ||= Hash.new {|hash, sha| hash[sha] = read(sha) } end |
#checkout(branch) ⇒ Object
589 590 591 592 |
# File 'lib/gitgo/repo.rb', line 589 def checkout(branch) git.checkout(branch) self end |
#commit(msg = status) ⇒ Object
Commits any changes to git and writes the index to disk. The commit message is inferred from the status, if left unspecified. Commit will raise an error if there are no changes to commit.
538 539 540 541 542 543 544 |
# File 'lib/gitgo/repo.rb', line 538 def commit(msg=status) setup unless head sha = git.commit(msg) index.write(sha) sha end |
#commit!(msg = status) ⇒ Object
Same as commit but does not check if there are changes to commit, useful when you know there are changes to commit and don’t want the overhead of checking for changes.
549 550 551 552 553 554 555 |
# File 'lib/gitgo/repo.rb', line 549 def commit!(msg=status) setup unless head sha = git.commit!(msg) index.write(sha) sha end |
#create(sha) ⇒ Object
Creates a create association for the sha:
sh/a/empty_sha (DEFAULT_MODE, sha)
295 296 297 298 |
# File 'lib/gitgo/repo.rb', line 295 def create(sha) git[sha_path(sha, empty_sha)] = [DEFAULT_MODE, sha] sha end |
#delete(sha) ⇒ Object
Creates a delete association for the sha:
sh/a/sha (DEFAULT_MODE, empty_sha)
332 333 334 335 |
# File 'lib/gitgo/repo.rb', line 332 def delete(sha) git[sha_path(sha, sha)] = [DEFAULT_MODE, empty_sha] self end |
#diff(a, b, type = 'A') ⇒ Object
Returns a list of document shas that have been added (‘A’) between a and
-
Deleted (‘D’) or modified (‘M’) documents can be specified using
type.
479 480 481 482 483 484 485 486 487 488 489 490 491 492 |
# File 'lib/gitgo/repo.rb', line 479 def diff(a, b, type='A') if a == b || b.nil? return [] end paths = a.nil? ? git.ls_tree(b) : git.diff_tree(a, b)[type] paths.collect! do |path| ab, xyz, target = path.split('/', 3) assoc_sha("#{ab}#{xyz}", target) end paths.compact! paths end |
#each ⇒ Object
Yields the sha of each document in the repo, in no particular order and with duplicates for every link/update that has multiple association sources.
419 420 421 422 423 424 425 426 427 428 429 430 431 432 |
# File 'lib/gitgo/repo.rb', line 419 def each git.tree.each_pair(true) do |ab, xyz_tree| next unless ab.length == 2 xyz_tree.each_pair(true) do |xyz, target_tree| source = "#{ab}#{xyz}" target_tree.keys.each do |target| doc_sha = assoc_sha(source, target) yield(doc_sha) if doc_sha end end end end |
#each_assoc(source) ⇒ Object
Yield each association for source to the block, with the association sha and type. Returns self.
405 406 407 408 409 410 411 412 413 414 |
# File 'lib/gitgo/repo.rb', line 405 def each_assoc(source) # :yields: sha, type return self if source.nil? target_tree = git.tree.subtree(sha_path(source)) target_tree.each_pair do |target, (mode, sha)| yield assoc_sha(source, target), assoc_type(source, target, mode) end if target_tree self end |
#empty_sha ⇒ Object
Returns the sha for an empty string, and ensures the corresponding object is set in the repo.
248 249 250 |
# File 'lib/gitgo/repo.rb', line 248 def empty_sha @empty_sha ||= git.set(:blob, '') end |
#git ⇒ Object
216 217 218 |
# File 'lib/gitgo/repo.rb', line 216 def git env[GIT] ||= Git.init(path, env[OPTIONS] || {}) end |
#graph(sha) ⇒ Object
Initializes a Graph for the sha.
435 436 437 |
# File 'lib/gitgo/repo.rb', line 435 def graph(sha) Graph.new(self, sha) end |
#head ⇒ Object
189 190 191 |
# File 'lib/gitgo/repo.rb', line 189 def head git.head end |
#index ⇒ Object
Returns the Index instance set in env. If no instance is set then one will be initialized under the git working directory, specific to the git branch. For instance:
.git/gitgo/refs/branch/index
226 227 228 |
# File 'lib/gitgo/repo.rb', line 226 def index env[INDEX] ||= Index.new(File.join(git.work_dir, 'refs', git.branch, 'index')) end |
#link(parent, child) ⇒ Object
Creates a link association for parent and child:
pa/rent/child (DEFAULT_MODE, child)
314 315 316 317 |
# File 'lib/gitgo/repo.rb', line 314 def link(parent, child) git[sha_path(parent, child)] = [DEFAULT_MODE, child] self end |
#path ⇒ Object
207 208 209 |
# File 'lib/gitgo/repo.rb', line 207 def path env[PATH] ||= (env.has_key?(GIT) ? env[GIT].path : Dir.pwd) end |
#read(sha) ⇒ Object
Reads and deserializes the specified hash of attrs. If sha does not indicate a blob that deserializes as JSON then read returns nil.
302 303 304 305 306 307 308 |
# File 'lib/gitgo/repo.rb', line 302 def read(sha) begin JSON.parse(git.get(:blob, sha).data) rescue JSON::ParserError, Errno::EISDIR nil end end |
#refs ⇒ Object
Returns an array of refs representing gitgo branches.
270 271 272 |
# File 'lib/gitgo/repo.rb', line 270 def refs select_branches(git.grit.refs) end |
#remotes ⇒ Object
Returns an array of remotes representing gitgo branches.
275 276 277 |
# File 'lib/gitgo/repo.rb', line 275 def remotes select_branches(git.grit.remotes) end |
#reset(full = false) ⇒ Object
594 595 596 597 598 599 |
# File 'lib/gitgo/repo.rb', line 594 def reset(full=false) git.reset(full) cache.clear index.reset self end |
#resolve(sha) ⇒ Object
201 202 203 |
# File 'lib/gitgo/repo.rb', line 201 def resolve(sha) git.resolve(sha) rescue sha end |
#rev_list(sha) ⇒ Object
Returns an array of revisions (commits) reachable from the sha. These revisions are cached for quick retreival.
467 468 469 470 471 472 473 474 |
# File 'lib/gitgo/repo.rb', line 467 def rev_list(sha) sha = sha.to_sym unless cache.has_key?(sha) cache[sha] = git.rev_list(sha.to_s) end cache[sha] end |
#save(attrs) ⇒ Object
Serializes and sets the attributes as a blob in the git repo and caches the attributes by the blob sha. Returns the blob sha.
Note that save does not put the blob along a path in the repo; immediately after save the blob is hanging and will be gc’ed by git unless set into a path by create, link, or update.
285 286 287 288 289 |
# File 'lib/gitgo/repo.rb', line 285 def save(attrs) sha = git.set(:blob, JSON.generate(attrs)) cache[sha] = attrs sha end |
#scope ⇒ Object
Sets self as the current Repo for the duration of the block.
602 603 604 |
# File 'lib/gitgo/repo.rb', line 602 def scope Repo.with_env(REPO => self) { yield } end |
#setup(upstream_branch = nil) ⇒ Object
557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 |
# File 'lib/gitgo/repo.rb', line 557 def setup(upstream_branch=nil) if head raise "already setup on: #{branch} (#{head})" end if upstream_branch.nil? || upstream_branch.empty? tree = Git::Tree.new tree[FILE] = [git.default_blob_mode, empty_sha] mode, sha = tree.write_to(git) git.commit!("setup gitgo", :tree => sha) current_tree = git.tree git.reset git.tree.merge!(current_tree) return self end unless branch?(upstream_branch) raise "not a gitgo branch: #{upstream_branch.inspect}" end if git.tracking_branch?(upstream_branch) git.track(upstream_branch) end git.merge(upstream_branch) cache.clear index.reset self end |
#sha_path(sha, *paths) ⇒ Object
Creates a nested sha path like: ab/xyz/paths
253 254 255 256 257 |
# File 'lib/gitgo/repo.rb', line 253 def sha_path(sha, *paths) paths.unshift sha[2,38] paths.unshift sha[0,2] paths end |
#status ⇒ Object
Generates a status message based on currently uncommitted changes.
495 496 497 498 499 500 501 502 503 504 505 506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 |
# File 'lib/gitgo/repo.rb', line 495 def status unless block_given? return status {|sha| sha} end lines = [] git.status.each_pair do |path, state| ab, xyz, target = path.split('/', 3) source = "#{ab}#{xyz}" sha = assoc_sha(source, target) type = assoc_type(source, target) status = case assoc_type(source, target) when :create type = self[sha]['type'] [type || 'doc', yield(sha)] when :link ['link', "#{yield(source)} to #{yield(target)}"] when :update ['update', "#{yield(target)} was #{yield(source)}"] when :delete ['delete', yield(sha)] else ['unknown', path] end if status status.unshift state_str(state) lines << status end end indent = lines.collect {|(state, type, msg)| type.length }.max format = "%s %-#{indent}s %s" lines.collect! {|ary| format % ary } lines.sort! lines.join("\n") end |
#timeline(options = {}) ⇒ Object
Returns an array of shas representing recent documents added.
440 441 442 443 444 445 446 447 448 449 450 451 452 453 454 455 456 457 458 459 460 461 462 463 |
# File 'lib/gitgo/repo.rb', line 440 def timeline(={}) = {:n => 10, :offset => 0}.merge() offset = [:offset] n = [:n] shas = [] return shas if n <= 0 dates = index.values('date').sort.reverse index.each_sha('date', dates) do |sha| if block_given? next unless yield(sha) end if offset > 0 offset -= 1 else shas << sha break if n && shas.length == n end end shas end |
#update(old_sha, new_sha) ⇒ Object
Creates an update association for old and new shas:
ol/d_sha/new_sha (UPDATE_MODE, new_sha)
323 324 325 326 |
# File 'lib/gitgo/repo.rb', line 323 def update(old_sha, new_sha) git[sha_path(old_sha, new_sha)] = [UPDATE_MODE, new_sha] self end |
#upstream_branch ⇒ Object
197 198 199 |
# File 'lib/gitgo/repo.rb', line 197 def upstream_branch git.upstream_branch end |