dagnabit

dagnabit is (yet another) ActiveRecord plugin for directed acyclic graphs. It provides an underlying representation that facilitates fast reachability queries at the expense of edge insertion/deletion speed and execution time.

The data structure used by dagnabit is the structure described in:

DONG, G., LIBKIN, L., SU, J., and WONG, L. “Maintaining the transitive closure of graphs in SQL.” In Int. J. Information Technology, vol. 5, 1999.

dagnabit was developed for a Ruby on Rails application designed for management and querying of large-and-rapidly-growing biospecimen banks (i.e. for cancer research). The application uses directed acyclic graphs for knowledge representation (storage and application of workflows) and representing biospecimen heritage.

dagnabit was developed at the Northwestern University Biomedical Informatics Center <www.nucats.northwestern.edu/centers/nubic/index.html>.

Related work

This plugin was inspired by Matthew Leventi’s acts-as-dag plugin <github.com/mleventi/acts-as-dag/tree/master>. Indeed, Leventi’s acts-as-dag plugin was originally used in the application from which this plugin was extracted, and dagnabit’s node and edge interfaces were modeled after the interfaces provided by acts-as-dag.

The primary differences between dagnabit and acts-as-dag are:

  • acts-as-dag does not permit linking of unpersisted nodes, i.e.

    n1 = Node.new
    n2 = Node.new
    e1 = Link.new(:ancestor => n1, :descendant => n2)
    ... other code ...
    
    [n1, n2, e1].map { |o| o.save }
    

    With acts-as-dag, one must save the nodes before creating the edge. The above code segment works in dagnabit.

  • acts-as-dag issues O(x*y) queries on edge insert, update, and delete, where x and y are the number of ancestors and descendants of a node. In environments where network latency between database and application server is considerable, this can adversely affect application performance.

    dagnabit uses a constant number of queries for insert, update, and delete. (The execution time and memory usage of queries are, of course, still affected by the size of the transitive closure set being updated.)

Database compatibility

dagnabit is only known to work with SQLite and PostgreSQL. Other databases may work, but they have not been tested.

dagnabit is known to not work on Oracle XE 10g databases via the ActiveRecord oracle-enhanced adapter.

Setup

dagnabit is distributed as a gem plugin; you can therefore install it as you would any other RubyGem. Integration with your Rails application can be achieved by adding dagnabit to your application’s gem dependencies list.

Details for creating the link tables are discussed below.

In your link model, put

acts_as_dag_link

In your node models, put

acts_as_dag_node_linked_by (your link model name)

acts_as_dag_link accepts some configuration options. These configuration options are discussed below.

For a code example of a complete setup, see the dagnabit-test script in bin.

Link table schemas

Link storage is accomplished with two tables having identical schemas. One table holds explicitly specified links between nodes. The other table contains the _transitive closure_ of all graphs, i.e. one tuple per path in the graph. (The transitive closure table is therefore a superset of the direct link table.)

The transitive closure table is automatically maintained by dagnabit and should not be manually modified. The transitive closure table does not need to be mapped to a model in your application.

The expected schema for both tables is:

t.integer :ancestor_id      # or configured foreign key ID
t.integer :descendant_id    # or configured foreign key ID
t.string :ancestor_type     # not configurable
t.string :descendant_type   # not configurable

The behavior of the plugin with a deviant schema is unspecified.

  • transitive_closure_table_name: The table where tuples in the transitive closure of the graph should be stored. Defaults to #{link_table_name}_transitive_closure_tuples.

  • transitive_closure_class_name: The transitive closure is represented with an ActiveRecord class whose name defaults to TransitiveClosureLink. Set this option to change that name.

  • descendant_id_column: Column in the link tables for recording the ID of a descendant. Defaults to descendant_id.

  • ancestor_id_column: Column in the link tables for recording the ID of a ancestor. Defaults to ancestor_id.

Usage

The recommended usage pattern is (in pseudocode)

nodes = make_nodes
links = connect_up(nodes)
save(nodes)
save(links)

Node and edge creation is identical to creation of any ActiveRecord model. For example, if your Node models were called AlphaNode and BetaNode and your edge model was called Link, these would be valid expressions:

n1 = AlphaNode.new
n2 = BetaNode.new

e1 = Link.new(:ancestor => n1, :descendant => n2)
n1.save
n2.save
e1.save

Node interface

acts_as_dag_node adds the following associations to nodes:

  • links_as_child: Links for which the node is a child.

  • links_as_parent: Links for which the node is a parent.

  • links_as_ancestor: Links for which the node is an ancestor. Read-only.

  • links_as_descendant: Links for which the node is a descendant. Read-only.

acts_as_dag_node adds the following instance methods to nodes:

  • children: All children of the node. Read-only.

  • parents: All parents of the node. Read-only.

  • ancestors: All ancestors of the node. Read-only.

  • descendants: All descendants of the node. Read-only.

Link interface

The following class methods are available on links:

  • paths: Retrieves all paths from one node to another.

  • path?: Returns whether a node is reachable from another node.

  • build_edge: Builds an edge from one node to another node.

  • connect: Builds and saves an edge from one node to another. Identical to build_edge(nodeA, nodeB).save.

Introduction to the codebase

Here are some starting points to become acquainted with the dagnabit codebase:

Dagnabit::Activation

How to mix this functionality into your models.

Dagnabit::Link::Configuration

Link configuration options.

Dagnabit::Link::Associations

Associations exposed on links.

Dagnabit::Node::Associations

Associations exposed on nodes.

Dagnabit::Node::Neighbors

How to locate a node’s neighbors.

Dagnabit::Link::TransitiveClosureRecalculation

How the transitive closure is maintained.

Copyright © 2009 David Yip. Released under the MIT License; see LICENSE for details.