Gitset - A data collection and curation tool for humans
Gitset is essentially a key-value store. The key is a path to a file on disk and the value is the contents of that file. The goal is to make collaborating on creating and curating datasets as easy as it is to contribute to an open source code project.
Gitset is not intended to be used as a database, rather it should be used to manage a canonical data source that is imported into a "real" database for production use. On your mark, gitset, go!
Installation
If you haven't already done so, set up git. It's a prerequisite. Once you've done that, you can install gitset from rubygems:
gem install gitset
Usage
Create a new dataset
- Create a project directory
mkdir /path/to/project
cd /path/to/project
- Create a YAML template to be used for new data points
# template.yaml
---
name: Full Name
emails:
- [email protected]
- [email protected]
- [email protected]
- Initialize the project with your template
gitset init template.yaml
Clone an existing dataset
gitset clone git://github.com/username/datasetname.git
Working with a dataset
- Add a new data point using the template
gitset create path/to/datapoint.yaml
- Edit the template
# path/to/datapoint.yaml
---
name: John Britton
emails:
- [email protected]
- Stage your changes
git add path/to/datapoint.yaml
- Commit your changes
git commit -m 'Added a person'
Use standard git to modify existing datapoints in your dataset and commit the changes.
BUT WAIT, THERE'S MORE!
- Branch the dataset
- Contribute to a dataset
- Merge contributions
- Filter the dataset