This is a somewhat specialized chunk of code!
Using a lookup table in Redis and venti as the backing storage, webvac serves static content. It comes with utility ‘webvac-sweep`, that pushes content into venti, optionally removing it from the filesystem. The `webvac-server` program itself starts up a webserver. Give these programs ’-h’ or ‘–help’ to see helpful(?) information, and see below for configuration.
Currently I am only using it to serve uploads for UGC in some small- to moderately high-traffic Pleroma instances. I have been using venti to do incremental backups of the data (replication is speedy), and since the uploads are WORM (“write once, read many”) data, I thought it’d be cool to serve the data directly out of venti. It worked better than expected!
I expect to use this more often and thus expect it to become a bit more general as a result, but for right now, it makes a couple of assumptions about where it serves files.
Quick Start
[Check that your venti and Redis servers are operational.]
$ sudo ed /etc/webvac.json
a
{"server_path_prepend":"/where/uploads/get/put/in/the/filesystem"}
.
wq
$ sudo $EDITOR /etc/nginx/whatever
[edit your nginx configuration to proxy requests for files to localhost:8891
if the file doesn't exist]
$ webvac-server -D
$ find /where/uploads/get/put/in/the/filesystem -type f -print0 |
xargs -0 -n30 -P8 webvac-sweep -v --
[At this point, you should be able to rm the files in server_path_prepend
to start serving them out of venti. Big ones are slow for now.]
Mise en place
You will need a venti server and a Redis server. It’ll work fine with either Plan 9 venti or P9P venti. You will also need the ‘vac’ and ‘unvac’ utilities that come with P9P.
You will need to either have all the stuff in the default places, or you’ll need to configure a handful of things. Otherwise, you will need to configure probably one thing.
Overhead
Just empirically, for about 100GB of files, venti takes 86GB of disk (no surprise, since it’s mostly JPGs and MP4s, so it’s already compressed; the savings are probably from dedup), and Redis takes about 60MB of RAM for this. All of the files were put into venti as part of the backup solution. CPU overhead is negligible, the server takes about 200MB of RAM for both workers.
Installation
‘gem install webvac` should do it.
Configuration
The nginx config I used to test is in doc/examples.
You will need to create a JSON file that specifies the configuration. The first readable file from the following list is used:
-
The filename given in the $WEBVAC_CONFIG environment variable.
-
./config/webvac.json
-
$HOME/.webvac.json
-
/etc/webvac.json
Here is an annotated list of configuration options, all of which start with their default values. You will almost certainly need to change server_path_prepend.
{
// A Redis connection string, one that Redic will take:
"redis_url": "redis://localhost:6379/0",
// These two options are used to turn the request path into a filesystem
// path. Watch this space, I might turn server_path_strip into an array.
// The beginning of the path that the entire server sits under; that is,
// what should be stripped from the front of the path. For Pleroma, users'
// uploads are all under /media, so this works if you're serving uploads from
// https://$domain/media/$file . You should probably not change this one for
// now, as Rake routes depend on it.
"server_path_strip": "/media",
// The beginning of the path in the *filesystem* that corresponds to where
// the files are kept. It is *incredibly* unlikely that you are using the
// same path I am.
"server_path_prepend": "/media/block/fse",
// This is where the venti server is. (This might be an array at some
// point.) This address is in dial(2) syntax, by the way, so $host:$port
// should be written "tcp!$host!$port".
"venti_server": "localhost",
// Where the vac and unvac binaries are kept:
"plan9bin": "/opt/plan9/bin",
// This is to make UGC work safely out of the box. Hopefully no browser is
// dumb enough to run stuff inside <script> tags in something we're serving
// as text/plain:
"mime_substitutions": {
"text/html": "text/plain"
}
}
Usage
Afer configuring, you can run the server with ‘webvac-server`. This will actually serve the content from venti, as long as it is present in the path→score index in Redis (so you can remove content as needed by just removing items from the index). In order to add items, you run `webvac-sweep`. You can also delete the file (which will only happen if the sweep is successful) with `-d`.
TODO
See doc/TODO
See also:
“Venti: a new approach to archival storage”: doc.cat-v.org/plan_9/4th_edition/papers/venti/
“Pleroma”: pleroma.social/
I feel dirty
You can throw Bitcoin (BTC) at this address: 1BZz3ndJUoWhEvm1BfW3FzceAjFqKTwqWV . Proceeds will go to funding the instance hosting thing.