Quandl Toolbelt

** The Quandl Toolbelt is currently in ALPHA TESTING. You are nevertheless welcome to try it. **

The Quandl toolbelt enables you to create and maintain time series data on Quandl.com. The Quandl toolbelt is extremly simple to understand and use. (We use it to maintain the 8 million datasets currently on the site.)

Installation

Mac

quandl-toolbelt.pkg

Windows

Quandl Setup.exe

Gemfile

In your Gemfile, add:

gem "quandl"

Once the toolbelt is installed, the next step is to login to Quandl:

$ quandl login
Obtain your token from the API tab on this page: http://www.quandl.com/users/info
Token: ***************
You have successfully authenticated!
username:   tammer1
email:      [email protected]

If you have a Quandl password, you can also use quandl login --method password. (You might not have a Quandl password if you log in using Github, Google, Linkedin or Twitter)

Create a Dataset

Create data.csv that looks something like this:

 code: AAA
 name: My first dataset
 description: It only has two rows.
 -----
 1999-12-31, 1.23
 2000-01-01, 4.56

Now send it to Quandl:

 $ quandl upload data.csv

You just created a dataset on Quandl.com: www.quandl.com/<your-username>/AAA

Update a Dataset

The new dataset will now exist on Quandl forever. You can send new data and/or update metadata whenever you want. For example, create data_update.csv:

code: AAA
description: I am updating this description.
--
2000-01-02, 99.9

Now send to Quandl:

 $ quandl upload data_update.csv

Notice that the dataset now has three rows and a new description:

www.quandl.com/<your-username>/AAA

Delete a Dataset

You can delete the dataset:

$ quandl delete AAA
source_code … created_at: '2014-01-21T15:53:22Z'
Are you sure? (y/n)
y
OK  1241ms  AAA

Scrapers and Other Data Producing Programs

As long as your program outputs Quandl flavored CSV as above, it is ready for use with the Quandl toolbelt. Consider this scraper, written in both Ruby (scraper1.rb) and Python (scraper1.py):

# This is script that pulls the history of Facebook (FB) stock price from Google.
# It then prints the data to the screen in CSV format.
# It prepends the CSV with Quandl metadata (code, name, description)
#
…

If I were to run this script on Jan 31, 2014 I would get this:

$ ruby scraper.rb
code: FB
name: Facebook Stock Price
----
Date,Open,High,Low,Close,Volume
30-Jan-14,62.12,62.50,60.46,61.08,150438699
29-Jan-14,54.61,54.95,53.19,53.53,98089932
...

I can turn the output of this script into a Quandl dataset like this:

$ ruby scraper.rb | quandl upload

$ python scraper.py | quandl upload

If you download the script and run the above command, you would see the result here:

www.quandl.com/<your-username>/FB

You can pipe the script to quandl upload each day to keep the dataset up to date on Quandl.com. Everytime you send data to an existing dataset the new data is merged with what is already there. (Hence the Quandl toolbelt is ideal for daily data harvesting or loading entire datasets or some combination of the two.)

Scheduling Your Scripts

This feature is not ready for use yet. When it is ready you will be able to send any script to Quandl. Quandl will then run the script on a schedule and send the output to quandl upload for you. You can (optionally) receive emails when the script succeeds or fails.

Many Datasets via One Input Stream

You can send multiple datasets with a single call to quandl upload. Scraper2.rb (Scraper2.py) produces the most recent closing data for two stocks:

code: AAPL
--
2000-01-15,88.32,...

code: MSFT
--
2000-01-15,44.20,...

Then

$ python scraper.py | quandl upload

Creates or updates both quandl.com/<your-username>/AAPL and quandl.com/<your-username>/MSFT

You can send an infinite number of datasets via one call to quandl upload.

Quandl Flavored CSV

Quandl "flavored" CSV is just just plain vanilla CSV prepended with metadata in YAML format. Metadata is seperated from data by a single line containing one or more dashes "-".

Quick Reference

Here is the entire specification by example for quick reference:

# This is a comment.  Also note blank lines are allowed; they are simply ignored

code: A01         # only (uppercase) letters, numbers and "_" can be used

name: My Dataset

description:   "This data is my dataset.  Note the use of quotes so
that I can use two lines and also use the reserved character ':'"

reference_url: www.wsj.com/somewhere # any valid url

frequency:     daily # frequency is inferred if you omit this field

private:       true # true => only you can see the dataset on Quandl

----

Date,Price,Volume # if omitted on new dataset, default headings are created
2012-01-01,32.23 # the csv data.  date can be almost any format you want

Metadata Specifications

Field	Description	Required?
code	a unique id for the dataset; uppercase letters, numbers and "_" are the only characters allowed.	Required
name	a name for the dataset	Strongly Recomended
description	a description for the dataset	Recomended
reference_url	An external URL where the data can be validated. Most datasets on Quandl cite an external source to maximize credability	Optional
frequency	daily, weekly, monthly,quarterly or annual	optional; inferred if omitted
private	true or false; default is false	private data is visible to only you

Example Scrapers

Shibor

www.shibor.org publishes Shibor rates which Quandl republishes at www.quandl.com/TAMMER1/SHIBOR

This dataset is maintained via this ruby script that fetches the 10 most recent days of data from Shibor.org.

You can run the script to print 10 days of Shibor rates to the screen:

curl "https://raw.github.com/tammer/scrapers/master/shibor.rb" | ruby

To maintain this dataset on Quandl, we simply run the following on a daily basis:

curl "https://raw.github.com/tammer/scrapers/master/shibor.rb" | ruby | quandl upload

Each day 10 rows are sent to Quandl. Usually 9 of those rows are redundant, but that is harmless since we replace existing data with exactly the same data. Notice how old data is not affected by the updates.

The backfill for this dataset was manually downloaded and converted into a simple CSV file which we then pushed to the site:

quandl upload shibor_backfill.csv

Hsieh Trend Following Factors

Professor David Hsieh maintains hedge fund trend following risk factors at faculty.fuqua.duke.edu/~dah7/HFRFData.htm. They are available on Quandl at quandl.com/TAMMER1/TFRF.

The data is maintained by running hsieh.rb every day. To see the output of the script:

curl "https://raw.github.com/tammer/scrapers/master/hsieh.rb" | ruby

To keep the data up to date, we scheduled a daily run of:

curl "https://raw.github.com/tammer/scrapers/master/hsieh.rb" | ruby | quandl upload

Copyright Data

Some data publishers provide data on the condition that you not republish it. When scraping such sites, be sure to set the private flag to be true so that only you can see the data, at which point you should be in compliance, since you are simply storing a single copy on a private cloud based repository; (no different from storing a copy on Google Docs or Dropbox).

For example, if you happen to need the MSCI Far East Index on Quandl, you can scrape it with a program like this. You then pipe to Quandl as usual, ensuring the private flag is true:

curl "https://raw.github.com/tammer/scrapers/master/msci.rb" | ruby | quandl upload

Now you have the data you need on Quandl while remaining compliant with MSCI's terms of use.

Additional Examples

Dataset	Scraper
Litecoin vs USD	litecoin.rb

Full Reference

Other features of the Toolbelt including quandl download, quandl info, quandl list and other minor features are documented in the Quandl Toolbelt Reference page.

FAQ

How can I use ":" in the name or description field?

You should put the text in double quotes:

code: FOO
name: My Dataset
description: "I love colons : :   :"

From Ruby:

puts "description: \"I love colons : :   :\" "

puts ' description: "I love colons : :   :" '

From Python:

print "description: \"I love colons : :   :\""

Are the Datasets Publicly Accessible?

You decide. By default it is public. use:

private: true

To make the dataset visible only to you.

Can you handle high frequency (intra-day) data?

No.

How do I including Blank or Nils

This is how you include nil datums:

Code: NIL
Name: Example Data with Missing Points
Description:  This dataset is for example only.
--
Date, High, Low, Mid
2005, 1,    2,   3
2004, 5,    nil, 4
2003, ,,9
2002, 1,    2,   N.a.

This dataset can be seen on Quandl right here

Your SHIBOR script seems to download the past 10 days' worth of data...

...Assuming that happens daily, then you'll have overlapping data (e.g., the most recent day's data is new, but the prior nine days worth of data should be in the database already). How does Quandl deal with that? What if the underlying data changes - will Quandl update the previous nine days of data? Will it record what the data used to be based on the 'original' dataset?

Answer: If you upload data for dates where data already exists, the new data over-writes the old data. Thus if you send redundant data, it is harmless. Shibor.rb is written this way for two reason: 1) helpful in case the publisher changes something a few days later. 2) helpful if we miss run for a couple of days for some reason.

A given municipal bond doesn't trade every day...

So, if I set up a separate 'id' for each bond, then each day there will be some bonds that get pricing updates and others that don't. Are there any issues with this, or can Quandl handle this kind of 'sparse' data?

Answer: Sparse data is not a problem.

Why can't I find my dataset using search on Quandl.

If it is private, it will not appear in search ever. If it is public, it can take up to 1 hour before our index is updated with your new dataset.

My Question is not answered!

You best email me then. Put "Toolbelt" in the subject and you go right to the top of my inbox.