Closure Tree
Closure_tree lets your ActiveRecord models act as nodes in a tree data structure
Common applications include modeling hierarchical data, like tags, page graphs in CMSes, and tracking user referrals.
Substantially more efficient than ancestry and acts_as_tree, and even more awesome than awesome_nested_set, closure_tree has some great features:
- Best-in-class select performance:
- Fetch your whole ancestor lineage in 1 SELECT.
- Grab all your descendants in 1 SELECT.
- Get all your siblings in 1 SELECT.
- Fetch all descendants as a nested hash in 1 SELECT.
- Find a node by ancestry path in 1 SELECT.
- Best-in-class mutation performance:
- 2 SQL INSERTs on node creation
- 3 SQL INSERT/UPDATEs on node reparenting
- Support for Rails 3.0, 3.1, 3.2, and 4.0.0.rc1
- Support for reparenting children (and all their progeny)
- Support for concurrency (using with_advisory_lock)
- Support for polymorphism STI within the hierarchy
find_or_create_by_path
for building out hierarchies quickly and conveniently- Support for deterministic ordering of children
- Support for preordered traversal of descendants
- Support for rendering trees in DOT format, using Graphviz
- Excellent test coverage in a variety of environments
See Bill Karwin's excellent Models for hierarchical data presentation for a description of different tree storage algorithms.
Table of Contents
- Installation
- Usage
- Accessing Data
- Polymorphic hierarchies with STI
- Deterministic ordering
- Concurrency
- FAQ
- Testing
- Change log
Installation
Note that closure_tree only supports Rails 3.0 and later, and has test coverage for MySQL, PostgreSQL, and SQLite.
Add this to your Gemfile:
gem 'closure_tree'
Run
bundle install
Add
acts_as_tree
to your hierarchical model(s). There are a number of options you can pass in, too.Add a migration to add a
parent_id
column to the model you want to act_as_tree. You may want to also add a column for deterministic ordering of children, but that's optional.class AddParentIdToTag < ActiveRecord::Migration def change add_column :tag, :parent_id, :integer end end
Note that if the column is null, the tag will be considered a root node.
Add a database migration to store the hierarchy for your model. By default the table name will be the model's table name, followed by "_hierarchies". Note that by calling
acts_as_tree
, a "virtual model" (in this case,TagHierarchy
) will be added automatically, so you don't need to create it.class CreateTagHierarchies < ActiveRecord::Migration def change create_table :tag_hierarchies, :id => false do |t| t.integer :ancestor_id, :null => false # ID of the parent/grandparent/great-grandparent/... tag t.integer :descendant_id, :null => false # ID of the target tag t.integer :generations, :null => false # Number of generations between the ancestor and the descendant. Parent/child = 1, for example. end # For "all progeny of…" selects: add_index :tag_hierarchies, [:ancestor_id, :descendant_id], :unique => true # For "all ancestors of…" selects add_index :tag_hierarchies, [:descendant_id] end end
Run
rake db:migrate
If you're migrating from another system where your model already has a
parent_id
column, runTag.rebuild!
and yourtag_hierarchies
table will be truncated and rebuilt.If you're starting from scratch you don't need to call
rebuild!
.
Usage
Creation
Create a root node:
grandparent = Tag.create(:name => 'Grandparent')
Child nodes are created by appending to the children collection:
parent = grandparent.children.create(:name => 'Parent')
Or by appending to the children collection:
child2 = Tag.new(:name => 'Second Child')
parent.children << child2
Or by calling the "add_child" method:
child3 = Tag.new(:name => 'Third Child')
parent.add_child child3
Then:
grandparent.self_and_descendants.collect(&:name)
=> ["Grandparent", "Parent", "First Child", "Second Child", "Third Child"]
child1.ancestry_path
=> ["Grandparent", "Parent", "First Child"]
find_or_create_by_path
We can do all the node creation and add_child calls with one method call:
child = Tag.find_or_create_by_path(["grandparent", "parent", "child"])
You can find
as well as find_or_create
by "ancestry paths".
Ancestry paths may be built using any column in your model. The default
column is name
, which can be changed with the :name_column option
provided to acts_as_tree
.
Note that any other AR fields can be set with the second, optional attributes
argument,
and as of version 4.2.0, these attributes are added to the where clause as selection criteria.
child = Tag.find_or_create_by_path(%w{home chuck Photos"}, {:tag_type => "File"})
This will pass the attribute hash of {:name => "home", :tag_type => "File"}
to
Tag.find_or_create_by_name
if the root directory doesn't exist (and
{:name => "chuck", :tag_type => "File"}
if the second-level tag doesn't exist, and so on).
Moving nodes around the tree
Nodes can be moved around to other parents, and closure_tree moves the node's descendancy to the new parent for you:
d = Tag.find_or_create_by_path %w(a b c d)
h = Tag.find_or_create_by_path %w(e f g h)
e = h.root
d.add_child(e) # "d.children << e" would work too, of course
h.ancestry_path
=> ["a", "b", "c", "d", "e", "f", "g", "h"]
Nested hashes
hash_tree
provides a method for rendering a subtree as an
ordered nested hash:
b = Tag.find_or_create_by_path %w(a b)
a = b.parent
b2 = Tag.find_or_create_by_path %w(a b2)
d1 = b.find_or_create_by_path %w(c1 d1)
c1 = d1.parent
d2 = b.find_or_create_by_path %w(c2 d2)
c2 = d2.parent
Tag.hash_tree
=> {a => {b => {c1 => {d1 => {}}, c2 => {d2 => {}}}, b2 => {}}}
Tag.hash_tree(:limit_depth => 2)
=> {a => {b => {}, b2 => {}}}
b.hash_tree
=> {b => {c1 => {d1 => {}}, c2 => {d2 => {}}}}
b.hash_tree(:limit_depth => 2)
=> {b => {c1 => {}, c2 => {}}}
If your tree is large (or might become so), use :limit_depth.
Without this option, hash_tree
will load the entire contents of that table into RAM. Your
server may not be happy trying to do this.
Graph visualization
to_dot_digraph
is suitable for passing into Graphviz.
For example, for the above tree, write out the DOT file with ruby:
File.open("example.dot", "w") { |f| f.write(Tag.root.to_dot_digraph) }
Then, in a shell, dot -Tpng example.dot > example.png
, which produces:
If you want to customize the label value, override the #to_digraph_label
instance method in your model.
Just for kicks, this is the test tree I used for proving that preordered tree traversal was correct:
Available options
When you include acts_as_tree
in your model, you can provide a hash to override the following defaults:
:parent_column_name
to override the column name of the parent foreign key in the model's table. This defaults to "parent_id".:hierarchy_table_name
to override the hierarchy class name. This defaults to the singular name of the model + "Hierarchy", likeTagHierarchy
.:hierarchy_table_name
to override the hierarchy table name. This defaults to the singular name of the model + "_hierarchies", liketag_hierarchies
.:dependent
determines what happens when a node is destroyed. Defaults tonullify
.:nullify
will simply set the parent column to null. Each child node will be considered a "root" node. This is the default.:delete_all
will delete all descendant nodes (which circumvents the destroy hooks):destroy
will destroy all descendant nodes (which runs the destroy hooks on each child node)
:name_column
used by #find_or_create_by_path
, #find_by_path
, andancestry_path
instance methods. This is primarily useful if the model only has one required field (like a "tag").:order
used to set up deterministic ordering
Accessing Data
Class methods
Tag.root
returns an arbitrary root nodeTag.roots
returns all root nodesTag.leaves
returns all leaf nodesTag.hash_tree
returns an ordered, nested hash that can be depth-limited.Tag.find_by_path(path, attributes)
returns the node whose name path ispath
. See (#find_or_create_by_path).Tag.find_or_create_by_path(path, attributes)
returns the node whose name path ispath
, and will create the node if it doesn't exist already.See (#find_or_create_by_path).Tag.find_all_by_generation(generation_level)
returns the descendant nodes who aregeneration_level
away from a root.Tag.find_all_by_generation(0)
is equivalent toTag.roots
.Tag.with_ancestor(ancestors)
scopes to all descendants whose ancestor is in the given list.
Instance methods
tag.root
returns the root for this nodetag.root?
returns true if this is a root nodetag.child?
returns true if this is a child node. It has a parent.tag.leaf?
returns true if this is a leaf node. It has no children.tag.leaves
is scoped to all leaf nodes in self_and_descendants.tag.depth
returns the depth, or "generation", for this node in the tree. A root node will have a value of 0.tag.parent
returns the node's immediate parent. Root nodes will return nil.tag.children
is ahas_many
of immediate children (just those nodes whose parent is the current node).tag.child_ids
is an array of the IDs of the children.tag.ancestors
is a ordered scope of [ parent, grandparent, great grandparent, … ]. Note that the size of this array will always equaltag.depth
.tag.ancestor_ids
is an array of the IDs of the ancestors.tag.self_and_ancestors
returns a scope containing self, parent, grandparent, great grandparent, etc.tag.siblings
returns a scope containing all nodes with the same parent astag
, excluding self.tag.sibling_ids
returns an array of the IDs of the siblings.tag.self_and_siblings
returns a scope containing all nodes with the same parent astag
, including self.tag.descendants
returns a scope of all children, childrens' children, etc., excluding self ordered by depth.tag.descendant_ids
returns an array of the IDs of the descendants.tag.self_and_descendants
returns a scope of all children, childrens' children, etc., including self, ordered by depth.tag.hash_tree
returns an ordered, nested hash that can be depth-limited.tag.find_by_path(path)
returns the node whose name path fromtag
ispath
. See (#find_or_create_by_path).tag.find_or_create_by_path(path)
returns the node whose name path fromtag
ispath
, and will create the node if it doesn't exist already.See (#find_or_create_by_path).tag.find_all_by_generation(generation_level)
returns the descendant nodes who aregeneration_level
away fromtag
.tag.find_all_by_generation(0).to_a
==[tag]
tag.find_all_by_generation(1)
==tag.children
tag.find_all_by_generation(2)
will return the tag's grandchildren, and so on.
tag.destroy
will destroy a node and do something to its children, which is determined by the:dependent
option passed toacts_as_tree
.
Polymorphic hierarchies with STI
Polymorphic models using single table inheritance (STI) are supported:
- Create a db migration that adds a String
type
column to your model - Subclass the model class. You only need to add
acts_as_tree
to your base class:
class Tag < ActiveRecord::Base
acts_as_tree
end
class WhenTag < Tag ; end
class WhereTag < Tag ; end
class WhatTag < Tag ; end
Please note that Rails (<= 3.2) doesn't handle polymorphic associations correctly if
you use the :type
attribute, so this doesn't work:
# BAD: ActiveRecord ignores the :type attribute:
root.children.create(:name => "child", :type => "WhenTag")
Instead, use either .add_child
or children <<
:
# GOOD!
a = Tag.create!(:name => "a")
b = WhenTag.new(:name => "b")
a.children << b
c = WhatTag.new(:name => "c")
b.add_child(c)
See issue 43 for more information.
Deterministic ordering
By default, children will be ordered by your database engine, which may not be what you want.
If you want to order children alphabetically, and your model has a name
column, you'd do this:
class Tag < ActiveRecord::Base
acts_as_tree :order => 'name'
end
If you want a specific order, add a new integer column to your model in a migration:
t.integer :sort_order
and in your model:
class OrderedTag < ActiveRecord::Base
acts_as_tree :order => 'sort_order'
end
When you enable order
, you'll also have the following new methods injected into your model:
tag.siblings_before
is a scope containing all nodes with the same parent astag
, whose sort order column is less thanself
. These will be ordered properly, so thelast
element in scope will be the sibling immediately beforeself
tag.siblings_after
is a scope containing all nodes with the same parent astag
, whose sort order column is more thanself
. These will be ordered properly, so thefirst
element in scope will be the sibling immediately "after"self
If your order
column is an integer attribute, you'll also have these:
The class method
#roots_and_descendants_preordered
, which returns all nodes in your tree, pre-ordered.node1.self_and_descendants_preordered
which will return descendants, pre-ordered.node1.prepend_sibling(node2)
which will- set
node2
to the same parent asnode1
, - set
node2
's order column to 1 less thannode1
's value, and - decrement the order_column of all children of node1's parents whose order_column is <>>= node2's new value by 1.
- set
node1.append_sibling(node2)
which will- set
node2
to the same parent asnode1
, - set
node2
's order column to 1 more thannode1
's value, and - increment the order_column of all children of node1's parents whose order_column is >= node2's new value by 1.
- set
root = OrderedTag.create(:name => "root")
a = OrderedTag.create(:name => "a", :parent => "root")
b = OrderedTag.create(:name => "b")
c = OrderedTag.create(:name => "c")
# We have to call 'root.reload.children' because root won't be in sync with the database otherwise:
a.append_sibling(b)
root.reload.children.collect(&:name)
=> ["a", "b"]
a.prepend_sibling(b)
root.reload.children.collect(&:name)
=> ["b", "a"]
a.append_sibling(c)
root.reload.children.collect(&:name)
=> ["b", "a", "c"]
b.append_sibling(c)
root.reload.children.collect(&:name)
=> ["b", "c", "a"]
Concurrency
Several methods, especially #rebuild
and #find_or_create_by_path
, cannot run concurrently correctly.
#find_or_create_by_path
, for example, may create duplicate nodes.
Database row-level locks work correctly with PostgreSQL, but MySQL's row-level locking is broken, and erroneously reports deadlocks where there are none. To work around this, and have a consistent implementation for both MySQL and PostgreSQL, with_advisory_lock is used automatically to ensure correctness.
If you are already managing concurrency elsewhere in your application, and want to disable the use
of with_advisory_lock, pass :with_advisory_lock => false
in the options hash:
class Tag
acts_as_tree :with_advisory_lock => false
end
Note that you will eventually have data corruption if you disable advisory locks, write to your database with multiple threads, and don't provide an alternative mutex.
FAQ
Does this gem support multiple parents?
No. This gem's API is based on the assumption that each node has either 0 or 1 parent.
The underlying closure tree structure will support multiple parents, but there would be many breaking-API changes to support it. I'm open to suggestions and pull requests.
How do I use this with test fixtures?
Test fixtures aren't going to be running your after_save
hooks after inserting all your
fixture data, so you need to call .rebuild!
before your test runs. There's an example in
the spec tag_spec.rb
:
describe "Tag with fixtures" do
fixtures :tags
before :each do
Tag.rebuild! # <- required if you use fixtures
end
However, if you're just starting with Rails, may I humbly suggest you adopt a factory library, rather than using fixtures? Lots of people have written about this already.
Testing
Closure tree is tested under every valid combination of
- Ruby 1.8.7, Ruby 1.9.3, and Ruby 2.0.0
- The latest Rails 3.0, 3.1, 3.2, and 4.0 branches, and
- MySQL and PostgreSQL. SQLite works in a single-threaded environment.
Assuming you're using rbenv, you can use tests.sh
to
run the test matrix locally.
Parallelism is not tested with Rails 3.0.x nor 3.1.x due to this known issue.
Change log
4.2.0
- Added
with_ancestor(*ancestors)
. Thanks for the idea, Matt! - Applied Leonel Galan's fix for Strong Attribute support
find_or_create_by
now uses passed-in attributes as both selection and creation criteria. Thanks for the help, Judd Blair! Please note that this changes prior behavior—test your code with this new version!ct_advisory_lock
was moved into the_ct
support class, to reduce model method pollution
4.1.0
- Added support for Rails 4.0.0.rc1 and Ruby 2.0.0 (while maintaining backward compatibility with Rails 3, BOOYA)
- Added
#to_dot_digraph
, suitable for Graphviz rendering
4.0.1
Numeric, deterministically ordered siblings will always be 0..#self_and_siblingsself_and_siblings.count. Resolves issue 49. Thanks for the help, Leonel Galan, Juan Hoyos, and Michael Elfassy!
The
order
option can be a symbol now. Resolves issue 46.
4.0.0
- Moved all of closure_tree's implementation-detail methods into a
ClosureTree::Support
instance, which removes almost all of the namespace pollution in your models that wasn't for normal consumption. If you were using any of these methods, they're now available through the "_ct" class and instance member.
This change may break consumers, so I incremented the major version number, even though no new functionality was released.
3.10.2
Prevent faulty SQL statement when
#siblings
is called on an unsaved records. Resolves issue 52. Perfect pull request by Gary Greyling.The
.roots
class method now correctly respects the:order
option. Resolves issue 53. Thanks for finding this, Brendon Muir!
3.10.1
Multipart constant names like "Admin::PageHierarchy" are now supported. Resolves issue 47. Thanks for the perfect pull request, Simon Menke!
Committing transactions involving large numbers of hierarchy model classes was very slow due to hash collisions in the hierarchy class. A better hash implementation addressed issue 48. Thanks, Joel Turkel!
3.10.0
- Added
#roots_and_descendants_preordered
. Thanks for the suggestion, Leonel Galan!
3.9.0
- Added
.child_ids
. - Removed
dependent => destroy
on the descendant_hierarchy and ancestor_hierarchy collections (they were a mistake). - Clarified documentation for creation and child associations.
Because
Tag.create!(:parent => ...)
requires a.reload
, I removed it as an example.
All three of these improvements were suggested by Andrew Bromwich. Thanks!
3.8.2
- find_by_path uses 1 SELECT now. BOOM.
3.8.1
- Double-check locking for find_or_create_by_path
3.8.0
- Support for preordered descendants. This requires a numeric sort order column. Resolves feature request 38.
- Moved modules from
acts_as_tree
into separate files
3.7.3
Due to MySQL's inability to lock rows properly, I've switched to advisory_locks for all write paths. This will prevent deadlocks, addressing issue 41.
3.7.2
3.7.1
- Moved requires into ActiveSupport.on_load
- Added
require 'with_advisory_lock'
3.7.0
Thread safety!
- Advisory locks were
integrated with the class-level
find_or_create_by_path
andrebuild!
. - Pessimistic locking is used by the instance-level
find_or_create_by_path
.
3.6.9
- Don Morrison massaged the #hash_tree query to
be more efficient, and found a bug in
hash_tree
's query that resulted in duplicate rows, wasting time on the ruby side.
3.6.7
- Added workaround for ActiveRecord::Observer usage pre-db-creation. Addresses issue 32. Thanks, Don Morrison!
3.6.6
- Added support for Rails 4's strong parameter. Thanks, James Miller!
3.6.5
- Use
quote_table_name
instead ofquote_column_name
. Addresses issue 29. Thanks, Marcello Barnaba!
3.6.4
- Use
.pluck
when available for.ids_from
. Addresses issue 26. Thanks, Chris Sturgill!
3.6.3
- Fixed issue 24, which optimized
#hash_tree
for roots. Thanks, Saverio Trioni!
3.6.2
- Fixed issue 23, which added support for
#siblings
when sort_order wasn't specified. Thanks, Gary Greyling!
3.6.1
- Fixed issue 20, which affected deterministic ordering when siblings where different STI classes. Thanks, edwinramirez!
3.6.0
Added support for:
:hierarchy_class_name
as an option- ActiveRecord::Base.table_name_prefix
- ActiveRecord::Base.table_name_suffix
This addresses issue 21. Thanks, Judd Blair!
3.5.2
- Added
find_all_by_generation
for feature request 17.
3.4.2
- Fixed issue 18, which affected append_node/prepend_node ordering when the first node didn't have an explicit order_by value
3.4.1
- Reverted .gemspec mistake that changed add_development_dependency to add_runtime_dependency
3.4.0
Fixed issue 15:
- "parent" is now attr_accessible, which adds support for constructor-provided parents.
- updated readme accordingly
3.3.2
- Merged calebphillips' patch for a more efficient leaves query
3.3.1
- Added support for partially-unsaved hierarchies issue 13:
a = Tag.new(name: "a") b = Tag.new(name: "b") a.children << b a.save
3.3.0
- Added
hash_tree
.
3.2.1
- Added
ancestor_ids
,descendant_ids
, andsibling_ids
- Added example spec to solve issue 9
3.2.0
- Added support for deterministic ordering of nodes.
3.1.0
- Switched to using
has_many :though
rather thanhas_and_belongs_to_many
3.0.4
- Merged pull request to fix
.siblings
and.self_and_siblings
(Thanks, eljojo!)
3.0.3
- Added support for ActiveRecord's whitelist_attributes
(Make sure you read the Rails Security Guide, and
enable
config.active_record.whitelist_attributes
in yourconfig/application.rb
ASAP!)
3.0.2
- Fix for ancestry-loop detection (performed by a validation, not through raising an exception in before_save)
3.0.1
- Support 3.2.0's fickle deprecation of InstanceMethods (Thanks, jheiss)!
3.0.0
- Support for polymorphic trees
find_by_path
andfind_or_create_by_path
signatures changed to support constructor attributes- tested against Rails 3.1.3
2.0.0
- Had to increment the major version, as rebuild! will need to be called by prior consumers to support the new
leaves
class and instance methods. - Tag deletion is supported now along with
:dependent => :destroy
and:dependent => :delete_all
- Switched from default rails plugin directory structure to rspec
- Support for running specs under different database engines:
export DB ; for DB in sqlite3 mysql postgresql ; do rake ; done
Thanks to
- https://github.com/collectiveidea/awesome_nested_set
- https://github.com/patshaughnessy/class_factory
- JetBrains, which provides an open-source license to RubyMine for the development of this project.