EPO
EPO is a no-brainer, plain-ruby file system database. If you have objects and need to store them in a clean hierarchy, then EPO is a good choice. EPO is not a good choice if you want to perform optimized queries or if you operate on big datasets. In EPO, each object has a directory, so, it is easy to use programs which write their output at a given file without using temp dirs. Similarly, if you do lots of batch operations on your EPO’s resources, the one-directory per resource approach scales well.
Philosophy
EPO only is a small library, so summarizing it’s philosophy takes only a few lines:
-
My filesystem already is a database
-
My database should be plain ruby and
-
But my database should be easy to use from non-ruby programs
The EPO hierarchy
EPO tries to build from the lessons of KISS and REST routes. If you’re familiar with REST applications, you will find EPO’s hierarchy pretty straightforward.
Important Rules
Here are the simple rules for EPO’s databases:
-
EPO operates on Welo::Resources
-
each resource has one directory
-
the directory’s path identifies the resource stored
-
there is one serialization file per (resource, perspective, extension) tuple
-
any other file in the directory should not impact your ruby application, but may have meaning to other programs (e.g., a thumbnail created by your file-viewer)
Examples
If you have a resource “user” identified by its “login”, if you have three users: peter, jon, and marc. The database directories may look like the following. Between brackets are reference to further explanations.
$ tree db/ db/ └── user
├── peter
| ├── resource-epo-default.json [1a]
| └── picture.png [1b]
├── jon
| ├── resource-epo-default.json [2a]
| ├── picture.png [2b]
| └── pubkey.rsa.txt [2c]
└── marc
├── resource-epo-default.json [3a]
├── picture.png [3b]
└── thumbnail.dat [3c]
In the previous hierarchy, [1a, 2a, 3a] are the serialization of the user resources under the ‘default’ (or :default) perspective in the JSON format. The perspective is a concept of Welo’s resource, which is basically a list of fields of an object that you want to dump or observe (an administrator may have access to more details than a mere, non-registered user).
The other objects may or may not be related to your ruby application. A convenient explanation may be: [1b, 2b, 3b] are pictures, most likely your user’s headshots, uploaded by your users through your application. User [2b] has given his public key, and we see that your filebrowser has created a thumbnail in [3c], which has nothing to do with your application.
Attention
The main thing to keep in mind is that filesystems come with a slight problem: file paths are strings. As a result, when designing your EPO hierarchy, you must be aware of:
-
encoding issues
-
case sensitivity
-
ambiguities with path separators
Benefits
Example of things you can do easily with EPO:
-
use a filesystem explorer to visualize/explore your DB
-
organize your documents without hassle
-
use bash scripts for batch processing
-
use rsync/NFS/git on all or part of your database’s content
-
use ftp/http servers to expose a branch of your DB’s hierarchy
-
have other programs use your DB easily without configuring databases
Say you have photos to sort. You may sort them by year, by place, by subject or other things. Some software propose you to tag your photos and build a database with these photos for you. Unfortunately, these softwares are often closed, you cannot re-use their database, or it requires lots of effort to script the software to do something it is not ready for (e.g., resize your pictures in a batch). Moreover, the file-viewer of your OS may be good enough to view your pictures. Filesystems have solved many database issues since ages. So, why not just rely on your file system to store your pictures?
Usage (please have a look at the examples directory)
EPO is simple
There is no complex things in EPO, as an example, there is no:
-
index
-
transaction
-
thread/multiprocess safety
-
lifecycle hook
If you want to add a missing feature, feel free to fork/subclass/add a module, or implement it directly on your filesystem.
Theory
EPO uses:
-
Derailleur’s paths routing to quickly map paths to resources (we may want to remove this dependency later)
-
Welo’s observers to hook events when iterating on the filesystem
EPO::DB are just simple, in-memory Ruby objects. They have no connection to take care of, no credentials.
An EPO::DB may understand several formats (json or yaml are standard choices). You may modify this by changing EPO::DB#extensions .
EPO::DB are headless, in the sense that they don’t store their hierarchy’s root in a variable. As a result, an EPO::DB contains the concepts of the models but are not actually tied to the data on the filesystem. You may have two EPO::DB on the same filesystem’s root (each understanding different resources or formats).
Commented code bits
Say you have two models (which are Welo::Resources): Person and Item. Both models must have a “identifying” named :flat_db (this is just a default convention).
class Person
include Welo::Resource
identify :flat_db, [:name]
...
If you want to create a database only able to handle persons
EPO::DB.new([Person])
Creates a database able to handle persons and items
db = EPO::DB.new([Person, Item])
Saves a person with the default perspective, in all format
person = Person.new(...)
db.save(root, person)
Iterates on all the DB items (as observations)
db.each_resource(root) { |observation| ... }
Dependencies
welo >= 0.1.0 derailleur >= 0.5.0 a JSON library if you use json formatting (recommended)
Benchmark
Config:
-
macbook pro (2010), without SSD:
-
ruby 1.9.2
-
json 1.5.1
$ ruby example/benchmark.rb
0.770000 1.020000 1.790000 ( 4.563546)
5935
1.160000 0.420000 1.580000 ( 1.601897)
So, roughly 5 seconds to write 6000 records, and 2 seconds to read them back. No sync operation nor cache flush was forced in between the results. That’s maybe why we can read back only 5935 records out of 6000.
License
The MIT license