What?

Peegee (pronounced “Pee – Gee”, as in the letters PG, for Postgres) is a utility that provides an abstraction for performing PostgreSQL related tasks from Ruby, or your Rails application.

Why?

This project started with the need to improve the performance of PostgreSQL’s CLUSTER command. While clustering your database tables improves query performance significantly, the actual process of clustering takes unacceptable amounts of time. For instance, it may take roughly 20 hours on a moderately big set of tables (around 60GB). Your milage may vary, of course. Refactoring the clustering functionality surfaced a clean set of utilities that resulted in the creation of this gem. Some are useful, and some are just pointless.

Example usage

Assuming we’re in a Rails app, you can simply do:

people_table = Peegee::Table.new(:table_name => 'people')
people_table.ddl # a string containing this table's DDL
people_table.indexes
people_table.foreign_keys
people_table.dependent_foreign_keys

The three commands above return arrays of Peegee::Index and Peegee::ForeignKey objects, which all respond to drop and create. For example, you could drop and recreate the first dependent foreign key (some other table referencing people) with:

people_table.dependent_foreign_keys.first.drop
people_table.dependent_foreign_keys.first.create

After the call to drop, the @dependent_foreign_keys instance variable on the people_table object remains cached, and calling create will simply execute the cached SQL required to recreate the database object.

The strategy used for clustering a table is to store all dependencies in order to restore them later (as viewed above), and move all of the table’s data on the order given by a certain index. This is all executed in a database transaction in case it blows up. The result should be just like the pgsql native CLUSTER command:

people_table.cluster('people_pk') #must specify an index by which to cluster

Todo

The first order of business will be to create specs where applicable.

Then, the idea is to keep adding functionality that may be useful for DBAs or application developers and other PostgreSQL users. Some that come to mind are:

Ability to clean out all of table’s index and foreign key names, following a given pattern. On occasions, table names are altered leaving behind pesky legacy names for related objects.
Ability to identify database stinks. For instance:
- tables without primary keys, or indexes (which are not pure join tables)
- foreign keys without indexes (useful for manually identifying and fixing if applicable),
- report on columns that look like foreign keys, but aren’t (based on rails conventions).
- columns that are nullable, and are flagged as unique
- tables with incrementing columns names, indiciating a possible denormalization
- columns with a default value of ‘NULL’ (the varchar), where NULL may have been intended. (much of these are stolen from the excellent SchemaSpy)

There’s also a bunch of #TODO tags in the code, where there’s clearly room for improvement.

I would also like to make this code ORM agnostic. Right now, it depends on ActiveRecord to retrieve a connection and execute commands.

Please fork away and help improve it.

Revision History

Version 0.1.1 (April 13th, 2009):
- Set up RSpec testing environment. Started creating test suite.
- Added the Configuration singleton class, which for now allows the user to specify which indexes to cluster each table by.
- Added exists? class method to Peegee::Table class. Creating a Peegee::Table instance now fails if the table does not exist in the database.
- Added bang (!) methods to Peegee::Table class, which force the lookup of foreign_keys!, constraints!, indexes!, etc, even when they’re already cached.
- Fixed bug in cluster method.

Version 0.1.0 (April 10th, 2009):
- First release.

What?

Why?

Example usage

Todo

Revision History

License