closure_tree

Closure Tree

Closure_tree lets your ActiveRecord models act as nodes in a tree data structure

Common applications include modeling hierarchical data, like tags, threaded comments, page graphs in CMSes, and tracking user referrals.

CI Gem Version

Dramatically more performant than ancestry and acts_as_tree, and even more awesome than awesome_nested_set, closure_tree has some great features:

See Bill Karwin’s excellent Models for hierarchical data presentation for a description of different tree storage algorithms.

Table of Contents

Installation

Note that closure_tree only supports ActiveRecord 6.0 and later, and has test coverage for MySQL, PostgreSQL, and SQLite.

  1. Add gem 'closure_tree' to your Gemfile

  2. Run bundle install

  3. Add has_closure_tree (or acts_as_tree, which is an alias of the same method) to your hierarchical model:

    class Tag < ActiveRecord::Base
      has_closure_tree
    end
    
    class AnotherTag < ActiveRecord::Base
      acts_as_tree
    end
    

    Make sure you check out the large number of options that has_closure_tree accepts.

    IMPORTANT: Make sure you add has_closure_tree after attr_accessible and self.table_name = lines in your model.

    If you’re already using other hierarchical gems, like ancestry or acts_as_tree, please refer to the warning section!

  4. Add a migration to add a parent_id column to the hierarchical model. You may want to also add a column for deterministic ordering of children, but that’s optional.

    class AddParentIdToTag < ActiveRecord::Migration
      def change
        add_column :tags, :parent_id, :integer
      end
    end
    

    The column must be nullable. Root nodes have a NULL parent_id.

  5. Run rails g closure_tree:migration tag (and replace tag with your model name) to create the closure tree table for your model.

    By default the table name will be the model’s table name, followed by “_hierarchies”. Note that by calling has_closure_tree, a “virtual model” (in this case, TagHierarchy) will be created dynamically. You don’t need to create it.

  6. Run rake db:migrate

  7. If you’re migrating from another system where your model already has a parent_id column, run Tag.rebuild! and your tag_hierarchies table will be truncated and rebuilt.

    If you’re starting from scratch you don’t need to call rebuild!.

NOTE: Run rails g closure_tree:config to create an initializer with extra configurations. (Optional)

Warning

As stated above, using multiple hierarchy gems (like ancestry or nested set) on the same model will most likely result in pain, suffering, hair loss, tooth decay, heel-related ailments, and gingivitis. Assume things will break.

Usage

Creation

Create a root node:

grandparent = Tag.create(name: 'Grandparent')

Child nodes are created by appending to the children collection:

parent = grandparent.children.create(name: 'Parent')

Or by appending to the children collection:

child2 = Tag.new(name: 'Second Child')
parent.children << child2

Or by calling the “add_child” method:

child3 = Tag.new(name: 'Third Child')
parent.add_child child3

Or by setting the parent on the child :

Tag.create(name: 'Fourth Child', parent: parent)

Then:

grandparent.self_and_descendants.collect(&:name)
=> ["Grandparent", "Parent", "First Child", "Second Child", "Third Child", "Fourth Child"]

child1.ancestry_path
=> ["Grandparent", "Parent", "First Child"]

find_or_create_by_path

You can find as well as find_or_create by “ancestry paths”.

If you provide an array of strings to these methods, they reference the name column in your model, which can be overridden with the :name_column option provided to has_closure_tree.

child = Tag.find_or_create_by_path(%w[grandparent parent child])

As of v5.0.0, find_or_create_by_path can also take an array of attribute hashes:

child = Tag.find_or_create_by_path([
  {name: 'Grandparent', title: 'Sr.'},
  {name: 'Parent', title: 'Mrs.'},
  {name: 'Child', title: 'Jr.'}
])

If you’re using STI, The attribute hashes can contain the sti_name and things work as expected:

child = Label.find_or_create_by_path([
  {type: 'DateLabel', name: '2014'},
  {type: 'DateLabel', name: 'August'},
  {type: 'DateLabel', name: '5'},
  {type: 'EventLabel', name: 'Visit the Getty Center'}
])

Moving nodes around the tree

Nodes can be moved around to other parents, and closure_tree moves the node’s descendancy to the new parent for you:

d = Tag.find_or_create_by_path %w[a b c d]
h = Tag.find_or_create_by_path %w[e f g h]
e = h.root
d.add_child(e) # "d.children << e" would work too, of course
h.ancestry_path
=> ["a", "b", "c", "d", "e", "f", "g", "h"]

When it is more convenient to simply change the parent_id of a node directly (for example, when dealing with a form <select>), closure_tree will handle the necessary changes automatically when the record is saved:

j = Tag.find 102
j.self_and_ancestor_ids
=> [102, 87, 77]
j.update parent_id: 96
j.self_and_ancestor_ids
=> [102, 96, 95, 78]

Nested hashes

hash_tree provides a method for rendering a subtree as an ordered nested hash:

b = Tag.find_or_create_by_path %w(a b)
a = b.parent
b2 = Tag.find_or_create_by_path %w(a b2)
d1 = b.find_or_create_by_path %w(c1 d1)
c1 = d1.parent
d2 = b.find_or_create_by_path %w(c2 d2)
c2 = d2.parent

Tag.hash_tree
=> {a => {b => {c1 => {d1 => {}}, c2 => {d2 => {}}}, b2 => {}}}

Tag.hash_tree(:limit_depth => 2)
=> {a => {b => {}, b2 => {}}}

b.hash_tree
=> {b => {c1 => {d1 => {}}, c2 => {d2 => {}}}}

b.hash_tree(:limit_depth => 2)
=> {b => {c1 => {}, c2 => {}}}

If your tree is large (or might become so), use :limit_depth.

Without this option, hash_tree will load the entire contents of that table into RAM. Your server may not be happy trying to do this.

HT: ancestry and elhoyos

Eager loading

Since most of closure_tree’s methods (e.g. children) return regular ActiveRecord scopes, you can use the includes method for eager loading, e.g.

comment.children.includes(:author)

However, note that the above approach only eager loads the requested associations for the immediate children of comment. If you want to walk through the entire tree, you may still end up making many queries and loading duplicate copies of objects.

In some cases, a viable alternative is the following:

comment.self_and_descendants.includes(:author)

This would load authors for comment and all its descendants in a constant number of queries. However, the return value is an array of Comments, and the tree structure is thus lost, which makes it difficult to walk the tree using elegant recursive algorithms.

A third option is to use has_closure_tree_root on the model that is composed by the closure_tree model (e.g. a Post may be composed by a tree of Comments). So in post.rb, you would do:

# app/models/post.rb
has_closure_tree_root :root_comment

This gives you a plain has_one association (root_comment) to the root Comment (i.e. that with null parent_id).

It also gives you a method called root_comment_including_tree, which you can invoke as follows:

a_post.root_comment_including_tree(:author)

The result of this call will be the root Comment with all descendants and associations loaded in a constant number of queries. Inverse associations are also setup on all nodes, so as you walk the tree, calling children or parent on any node will not trigger any further queries and no duplicate copies of objects are loaded into memory.

The class and foreign key of root_comment are assumed to be Comment and post_id, respectively. These can be overridden in the usual way.

The same caveat stated above with hash_tree also applies here: this method will load the entire tree into memory. If the tree is very large, this may be a bad idea, in which case using the eager loading methods above may be preferred.

Graph visualization

to_dot_digraph is suitable for passing into Graphviz.

For example, for the above tree, write out the DOT file with ruby:

File.open("example.dot", "w") { |f| f.write(Tag.root.to_dot_digraph) }

Then, in a shell, dot -Tpng example.dot > example.png, which produces:

Example tree

If you want to customize the label value, override the #to_digraph_label instance method in your model.

Just for kicks, this is the test tree I used for proving that preordered tree traversal was correct:

Preordered test tree

Available options

When you include has_closure_tree in your model, you can provide a hash to override the following defaults:

Accessing Data

Class methods

Polymorphic hierarchies with STI

Polymorphic models using single table inheritance (STI) are supported:

  1. Create a db migration that adds a String type column to your model
  2. Subclass the model class. You only need to add has_closure_tree to your base class:
class Tag < ActiveRecord::Base
  has_closure_tree
end
class WhenTag < Tag ; end
class WhereTag < Tag ; end
class WhatTag < Tag ; end

Note that if you call rebuild! on any of the subclasses, the complete Tag hierarchy will be emptied, thus taking the hiearchies of all other subclasses with it (issue #275). However, only the hierarchies for the class rebuild! was called on will be rebuilt, leaving the other subclasses without hierarchy entries.

You can work around that by overloading the rebuild! class method in all your STI subclasses and call the super classes rebuild! method:

class WhatTag < Tag
  def self.rebuild!
    Tag.rebuild!
  end
end

This way, the complete hierarchy including all subclasses will be rebuilt.

Deterministic ordering

By default, children will be ordered by your database engine, which may not be what you want.

If you want to order children alphabetically, and your model has a name column, you’d do this:

class Tag < ActiveRecord::Base
  has_closure_tree order: 'name'
end

If you want a specific order, add a new integer column to your model in a migration:

t.integer :sort_order

and in your model:

class OrderedTag < ActiveRecord::Base
  has_closure_tree order: 'sort_order', numeric_order: true
end

When you enable order, you’ll also have the following new methods injected into your model:

If your order column is an integer attribute, you’ll also have these:


root = OrderedTag.create(name: 'root')
a = root.append_child(Label.new(name: 'a'))
b = OrderedTag.create(name: 'b')
c = OrderedTag.create(name: 'c')

# We have to call 'root.reload.children' because root won't be in sync with the database otherwise:

a.append_sibling(b)
root.reload.children.pluck(:name)
=> ["a", "b"]

a.prepend_sibling(b)
root.reload.children.pluck(:name)
=> ["b", "a"]

a.append_sibling(c)
root.reload.children.pluck(:name)
=> ["b", "a", "c"]

b.append_sibling(c)
root.reload.children.pluck(:name)
=> ["b", "c", "a"]

Ordering Roots

With numeric ordering, root nodes are, by default, assigned order values globally across the whole database table. So for instance if you have 5 nodes with no parent, they will be ordered 0 through 4 by default. If your model represents many separate trees and you have a lot of records, this can cause performance problems, and doesn’t really make much sense.

You can disable this default behavior by passing dont_order_roots: true as an option to your delcaration:

has_closure_tree order: 'sort_order', numeric_order: true, dont_order_roots: true

In this case, calling prepend_sibling and append_sibling on a root node or calling roots_and_descendants_preordered on the model will raise a RootOrderingDisabledError.

The dont_order_roots option will be ignored unless numeric_order is set to true.

Concurrency

Several methods, especially #rebuild and #find_or_create_by_path, cannot run concurrently correctly. #find_or_create_by_path, for example, may create duplicate nodes.

Database row-level locks work correctly with PostgreSQL, but MySQL’s row-level locking is broken, and erroneously reports deadlocks where there are none. To work around this, and have a consistent implementation for both MySQL and PostgreSQL, with_advisory_lock is used automatically to ensure correctness.

If you are already managing concurrency elsewhere in your application, and want to disable the use of with_advisory_lock, pass with_advisory_lock: false in the options hash:

class Tag
  has_closure_tree with_advisory_lock: false
end

Note that you will eventually have data corruption if you disable advisory locks, write to your database with multiple threads, and don’t provide an alternative mutex.

I18n

You can customize error messages using I18n:

en-US:
  closure_tree:
    loop_error: Your descendant cannot be your parent!

FAQ

Are there any how-to articles on how to use this gem?

Yup! Ilya Bodrov wrote Nested Comments with Rails.

Does this work well with #default_scope?

No. Please see issue 86 for details.

Can I update parentage with update_attribute?

No. update_attribute skips the validation hook that is required for maintaining the hierarchy table.

Can I assign a parent to multiple children with #update_all?

No. Please see issue 197 for details.

Does this gem support multiple parents?

No. This gem’s API is based on the assumption that each node has either 0 or 1 parent.

The underlying closure tree structure will support multiple parents, but there would be many breaking-API changes to support it. I’m open to suggestions and pull requests.

How do I use this with test fixtures?

Test fixtures aren’t going to be running your after_save hooks after inserting all your fixture data, so you need to call .rebuild! before your test runs. There’s an example in the spec tag_spec.rb:

  describe "Tag with fixtures" do
    fixtures :tags
    before :each do
      Tag.rebuild! # <- required if you use fixtures
    end

However, if you’re just starting with Rails, may I humbly suggest you adopt a factory library, rather than using fixtures? Lots of people have written about this already.

There are many lock-* files in my project directory after test runs

This is expected if you aren’t using MySQL or Postgresql for your tests.

SQLite doesn’t have advisory locks, so we resort to file locking, which will only work if the FLOCK_DIR is set consistently for all ruby processes.

In your spec_helper.rb or minitest_helper.rb, add a before and after block:

before do
  ENV['FLOCK_DIR'] = Dir.mktmpdir
end

after do
  FileUtils.remove_entry_secure ENV['FLOCK_DIR']
end

bundle install says Gem::Ext::BuildError: ERROR: Failed to build gem native extension

When building from source, the mysql2, pg, and sqlite gems need their native client libraries installed on your system. Note that this error isn’t specific to ClosureTree.

On Ubuntu/Debian systems, run:

sudo apt-get install libpq-dev libsqlite3-dev libmysqlclient-dev
bundle install

Object destroy fails with MySQL v5.7+

A bug was introduced in MySQL’s query optimizer. See the workaround here.

Hierarchy maintenance errors from MySQL v5.7.9-v5.7.10

Upgrade to MySQL 5.7.12 or later if you see this issue:

Mysql2::Error: You can't specify target table '*_hierarchies' for update in FROM clause

Testing with Closure Tree

Closure tree comes with some RSpec2/3 matchers which you may use for your tests:

require 'spec_helper'
require 'closure_tree/test/matcher'

describe Category do
 # Should syntax
 it { should be_a_closure_tree }
 # Expect syntax
 it { is_expected.to be_a_closure_tree }
end

describe Label do
 # Should syntax
 it { should be_a_closure_tree.ordered }
 # Expect syntax
 it { is_expected.to be_a_closure_tree.ordered }
end

describe TodoList::Item do
 # Should syntax
 it { should be_a_closure_tree.ordered(:priority_order) }
 # Expect syntax
 it { is_expected.to be_a_closure_tree.ordered(:priority_order) }
end

Testing

Closure tree is tested under every valid combination of

$ bundle
$ appraisal bundle # this will install the matrix of dependencies
$ appraisal rake # this will run the tests in all combinations
$ appraisal activerecord-7.0 rake # this will run the tests in AR 7.0 only
$ appraisal activerecord-7.0 rake spec # this will run rspec in AR 7.0 only
$ appraisal activerecord-7.0 rake test # this will run minitest in AR 7.0 only

By default the test are run with sqlite3 only. You run test with other databases by passing the database url as environment variable:

$ DATABASE_URL=postgres://localhost/my_database appraisal activerecord-7.0 rake test 

Change log

See the change log.

Thanks to