Quandl Toolbelt
** The Quandl Toolbelt is currently in ALPHA TESTING. You are nevertheless welcome to try it. **
The Quandl toolbelt enables you to create and maintain time series data on Quandl.com. The Quandl toolbelt is extremly simple to understand and use. (We use it to maintain the 8 million datasets currently on the site.)
Installation
Mac
Windows
Gemfile
In your Gemfile, add:
gem "quandl"
Login
Once the toolbelt is installed, the next step is to login to Quandl:
$ quandl login
Obtain your token from the API tab on this page: http://www.quandl.com/users/info
Token: ***************
You have successfully authenticated!
username: tammer1
email: [email protected]
If you have a Quandl password, you can also use quandl login --method password
. (You might not have a Quandl password if you log in using Github, Google, Linkedin or Twitter)
Create a Dataset
Create data.csv that looks something like this:
code: AAA
name: My first dataset
description: It only has two rows.
-----
1999-12-31, 1.23
2000-01-01, 4.56
Now send it to Quandl:
$ quandl upload data.csv
You just created a dataset on Quandl.com: www.quandl.com/<your-username>/AAA
Update a Dataset
The new dataset will now exist on Quandl forever. You can send new data and/or update metadata whenever you want. For example, create data_update.csv:
code: AAA
description: I am updating this description.
--
2000-01-02, 99.9
Now send to Quandl:
$ quandl upload data_update.csv
Notice that the dataset now has three rows and a new description:
www.quandl.com/<your-username>/AAA
Delete a Dataset
You can delete the dataset:
$ quandl delete AAA
source_code … created_at: '2014-01-21T15:53:22Z'
Are you sure? (y/n)
y
OK 1241ms AAA
Scrapers and Other Data Producing Programs
As long as your program outputs Quandl flavored CSV as above, it is ready for use with the Quandl toolbelt. Consider this scraper, written in both Ruby (scraper1.rb) and Python (scraper1.py):
# This is script that pulls the history of Facebook (FB) stock price from Google.
# It then prints the data to the screen in CSV format.
# It prepends the CSV with Quandl metadata (code, name, description)
#
…
If I were to run this script on Jan 31, 2014 I would get this:
$ ruby scraper.rb
code: FB
name: Facebook Stock Price
----
Date,Open,High,Low,Close,Volume
30-Jan-14,62.12,62.50,60.46,61.08,150438699
29-Jan-14,54.61,54.95,53.19,53.53,98089932
...
I can turn the output of this script into a Quandl dataset like this:
$ ruby scraper.rb | quandl upload
or
$ python scraper.py | quandl upload
If you download the script and run the above command, you would see the result here:
www.quandl.com/<your-username>/FB
You can pipe the script to quandl upload
each day to keep the dataset up to date on Quandl.com. Everytime you send data to an existing dataset the new data is merged with what is already there. (Hence the Quandl toolbelt is ideal for daily data harvesting or loading entire datasets or some combination of the two.)
Scheduling Your Scripts
This feature is not ready for use yet. When it is ready you will be able to send any script to Quandl. Quandl will then run the script on a schedule and send the output to quandl upload
for you. You can (optionally) receive emails when the script succeeds or fails.
Many Datasets via One Input Stream
You can send multiple datasets with a single call to quandl upload
. Scraper2.rb (Scraper2.py) produces the most recent closing data for two stocks:
code: AAPL
--
2000-01-15,88.32,...
code: MSFT
--
2000-01-15,44.20,...
Then
$ python scraper.py | quandl upload
Creates or updates both quandl.com/<your-username>/AAPL
and quandl.com/<your-username>/MSFT
You can send an infinite number of datasets via one call to quandl upload
.
Quandl Flavored CSV
Quandl "flavored" CSV is just just plain vanilla CSV prepended with metadata in YAML format. Metadata is seperated from data by a single line containing one or more dashes "-".
Quick Reference
Here is the entire specification by example for quick reference:
# This is a comment. Also note blank lines are allowed; they are simply ignored
code: A01 # only (uppercase) letters, numbers and "_" can be used
name: My Dataset
description: "This data is my dataset. Note the use of quotes so
that I can use two lines and also use the reserved character ':'"
reference_url: www.wsj.com/somewhere # any valid url
frequency: daily # frequency is inferred if you omit this field
private: true # true => only you can see the dataset on Quandl
----
Date,Price,Volume # if omitted on new dataset, default headings are created
2012-01-01,32.23 # the csv data. date can be almost any format you want
Metadata Specifications
Field | Description | Required? |
---|---|---|
code | a unique id for the dataset; uppercase letters, numbers and "_" are the only characters allowed. | Required |
name | a name for the dataset | Strongly Recomended |
description | a description for the dataset | Recomended |
reference_url | An external URL where the data can be validated. Most datasets on Quandl cite an external source to maximize credability | Optional |
frequency | daily, weekly, monthly,quarterly or annual | optional; inferred if omitted |
private | true or false; default is false | private data is visible to only you |
Example Scrapers
Shibor
www.shibor.org publishes Shibor rates which Quandl republishes at www.quandl.com/TAMMER1/SHIBOR
This dataset is maintained via this ruby script that fetches the 10 most recent days of data from Shibor.org.
You can run the script to print 10 days of Shibor rates to the screen:
curl "https://raw.github.com/tammer/scrapers/master/shibor.rb" | ruby
To maintain this dataset on Quandl, we simply run the following on a daily basis:
curl "https://raw.github.com/tammer/scrapers/master/shibor.rb" | ruby | quandl upload
Each day 10 rows are sent to Quandl. Usually 9 of those rows are redundant, but that is harmless since we replace existing data with exactly the same data. Notice how old data is not affected by the updates.
The backfill for this dataset was manually downloaded and converted into a simple CSV file which we then pushed to the site:
quandl upload shibor_backfill.csv
Hsieh Trend Following Factors
Professor David Hsieh maintains hedge fund trend following risk factors at faculty.fuqua.duke.edu/~dah7/HFRFData.htm. They are available on Quandl at quandl.com/TAMMER1/TFRF.
The data is maintained by running hsieh.rb every day. To see the output of the script:
curl "https://raw.github.com/tammer/scrapers/master/hsieh.rb" | ruby
To keep the data up to date, we scheduled a daily run of:
curl "https://raw.github.com/tammer/scrapers/master/hsieh.rb" | ruby | quandl upload
Copyright Data
Some data publishers provide data on the condition that you not republish it. When scraping such sites, be sure to set the private flag to be true so that only you can see the data, at which point you should be in compliance, since you are simply storing a single copy on a private cloud based repository; (no different from storing a copy on Google Docs or Dropbox).
For example, if you happen to need the MSCI Far East Index on Quandl, you can scrape it with a program like this. You then pipe to Quandl as usual, ensuring the private flag is true:
curl "https://raw.github.com/tammer/scrapers/master/msci.rb" | ruby | quandl upload
Now you have the data you need on Quandl while remaining compliant with MSCI's terms of use.
Additional Examples
Dataset | Scraper |
---|---|
Litecoin vs USD | litecoin.rb |
Full Reference
Other features of the Toolbelt including quandl download
, quandl info
, quandl list
and other minor features are documented in the Quandl Toolbelt Reference page.
FAQ
How can I use ":" in the name or description field?
You should put the text in double quotes:
code: FOO
name: My Dataset
description: "I love colons : : :"
From Ruby:
puts "description: \"I love colons : : :\" "
or
puts ' description: "I love colons : : :" '
From Python:
print "description: \"I love colons : : :\""
Are the Datasets Publicly Accessible?
You decide. By default it is public. use:
private: true
To make the dataset visible only to you.
Can you handle high frequency (intra-day) data?
No.
How do I including Blank or Nils
This is how you include nil datums:
Code: NIL
Name: Example Data with Missing Points
Description: This dataset is for example only.
--
Date, High, Low, Mid
2005, 1, 2, 3
2004, 5, nil, 4
2003, ,,9
2002, 1, 2, N.a.
This dataset can be seen on Quandl right here
Your SHIBOR script seems to download the past 10 days' worth of data...
...Assuming that happens daily, then you'll have overlapping data (e.g., the most recent day's data is new, but the prior nine days worth of data should be in the database already). How does Quandl deal with that? What if the underlying data changes - will Quandl update the previous nine days of data? Will it record what the data used to be based on the 'original' dataset?
Answer: If you upload data for dates where data already exists, the new data over-writes the old data. Thus if you send redundant data, it is harmless. Shibor.rb is written this way for two reason: 1) helpful in case the publisher changes something a few days later. 2) helpful if we miss run for a couple of days for some reason.
A given municipal bond doesn't trade every day...
So, if I set up a separate 'id' for each bond, then each day there will be some bonds that get pricing updates and others that don't. Are there any issues with this, or can Quandl handle this kind of 'sparse' data?
Answer: Sparse data is not a problem.
Why can't I find my dataset using search on Quandl.
If it is private, it will not appear in search ever. If it is public, it can take up to 1 hour before our index is updated with your new dataset.
My Question is not answered!
You best email me then. Put "Toolbelt" in the subject and you go right to the top of my inbox.