Chimps on the Command Line

Infochimps is an online data marketplace and repository where anyone can find, share, and sell data.

Infochimps offers two APIs for users to access and modify data

  • a Dataset API to list, show, create, update, and destroy datasets and associated resources on Infochimps

  • a Query API to query data from particular rows of these datasets

Chimps is a Ruby wrapper for these APIs that makes interacting with them simple. You can embed Chimps inside your web application or any other software you write.

But if you finding yourself wishing that you could make queries, create datasets, &c. from your command line, where you already live, where you already keep your data…then Chimps CLI is for you:

# See your datasets
$ chimps list --my
...

# Create a new dataset
$ chimps create title="A Brand New Dataset" description="That I created in 2 minutes.  But that doesn't mean it's not awesome."
...

# Take a look
$ chimps show a-brand-new-dataset

# Check out your competition
$ chimps search awesome data
...

# Hmmm.  Better do some more work.
$ chimps update a-brand-new-dataset tag_list="awesome,new,data"

First Steps

Installing Chimps CLI

Assuming you’ve already set up your Gem sources, just run

gem install chimps-cli

This will also install Chimps if it’s not already present on your system.

Configuring Chimps

Chimps CLI is just a command-line wrapper for the Chimps library. If Chimps is already properly configured with your API credentials then Chimps CLI will read them just fine without you having to do anything.

If you need to obtain API keys for either the Dataset API or the Query API then sign up at Infochimps.

You’ll need to put your API keys into one of two files, either /etc/chimps/chimps.yaml or ~/.chimps. See the README for Chimps for more details on how to set up these configuration files.

Usage

Try running

$ chimps help

to make sure you can run the chimps command and to see an overview of what subcommands are available. You can get more detailed help as well as example usage on COMMAND by running

$ chimps help COMMAND

You can test and see whether your credentials are valid using the test command:

$ chimps test
Authenticated as user 'Infochimps' for Infochimps Dataset API at http://www.infochimps.com
Authenticated for Infochimps Query API at http://api.infochimps.com

If you get messages about missing keys and so on go back and read the Chimps installation instructions.

If you get messages about not being able to authenticate, double-check that the API keys in your configuration file (either ~/.chimps or /etc/chimps/chimps.yaml) match the credentials listed in your profile.

Options

Commands to chimps accept arguments as well as options. Options always begin with two dashes and some options have single-letter flags as well.

Some options work for every chimps command. --verbose (-v), for example, is a great way to see what underlying HTTP request(s) a given command is making.

Operating on a Dataset, Source, License, &c.

Many requests can operate on a particular resource. The show command, for example, can be used to show a dataset (the default choice), a license, a source, or a user.

You can see what resources COMMAND can operate on with chimps help COMMAND. Two examples

# Will attempt to show the Dataset 'an-example'
$ chimps show an-example

# Will attempt to show the Source 'an-example'
$ chimps show source an-example

# Will search datasets for 'stocks'
$ chimps search stocks

# Will list all licenses
$ chimps list licenses

Providing Data to a Command

Some commands (typically those that result in HTTP GET and DELETE requests) don’t require you to pass any data to Infochimps.

Other commands (typically those that result in HTTP POST and PUT requests) do. These commands usually create or modify a dataset or other resource at Infochimps.

Say you wanted to create a new dataset on Infochimps with the title “List of hottest Salsas” and with description “All salsas were tried personally by me.”

There are two methods you can use to pass this data to a Chimps CLI command:

1) You can put the data you need to pass into a file on disk. Chimps understands YAML and JSON files formats and will automatically parse and serialize them properly when making a request. You could create the following file

# in salsa_dataset.yml
---
title: "List of hottest Salsas"
description: |-

  All salsas were tried personally by me.

and you can create the dataset with

$ chimps create --data=salsa_dataset.yml

2) You can pass parameters and values directly on the command line. You could create the same dataset as above with

$ chimps create title="List of hottest Salsas" description="All salsas were tried personally by me."

This will only work for a flat collection of parameters and values, as in this example. If you need to pass a nested data structure you should use a file and the --data option above.

Another example, which makes a query to the Query API and returns demographics on an IP address

$ chimps query web/an/ip_census ip=67.78.118.7

Basic HTTP Verbs

Infochimps’ Dataset API is RESTful so it respects the semantics of HTTP verbs. You can use this “lower-level” interface to make simple GET, POST, PUT, and DELETE requests.

Here’s how to return information on a Yahoo! Stocks dataset

$ chimps get /datasets/yahoo-stock-search

The default response will be in JSON but you can change the response format by explicitly passing a different one of xml, json, or yaml. This works for (almost) all Dataset API requests.

$ chimps get /datasets/yahoo-stock-search --response_format=yaml

Try running chimps help for the get command (chimps help get) as well as for the post, put, and delete commands.

Signed vs. Unsigned Requests

Some requests, like the GET request above, don’t need to be signed in any way: using chimps to make a simple unsigned GET request isn’t anything different than just doing it with curl.

All POST, PUT, and DELETE requests, however, need to be signed and using Chimps to do it makes it easy. Here’s how you might create a dataset

$ chimps post --sign /datasets title="My dataset" description="Some text..."

If you leave out the --sign option then the request will fail with a 401 Authentication error.

The above request is really just the same as

$ chimps create title="My dataset" description="Some text..."

which is a little simple because create understands what you’re trying to do and internally constructs the appropriate POST request.

You can find a list of all available requests, the correct HTTP verb to use, whether the request needs to be signed, and what parameters it accepts at www.infochimps.com/apis.

To round out this section, here’s an example of a PUT request and a DELETE request (both of which must be signed):

# Update your existing dataset
$ chimps put --sign /datasets/my-dataset title="A New Title"

# Let's delete this dataset now because we're fickle little monkeys...
$ chimps delete --sign /datasets/my-dataset

Most things you might want to do with this “low-level” HTTP verb interface can be done with specialized chimps commands. Read on.

Core REST Actions

Since the Infochimps Dataset API is RESTful, it implements list, show, create, update, and destroy actions for all resources. Each of these actions has a corresponding Chimps command.

Here’s how to list datasets:

$ chimps list

The list command is one of a few (search being another) that accepts the --my (-m) option. This will restrict the output to only datasets (or whatever resource you’re listing) that are owned by you.

$ chimps list --my
$ chimps list --my licenses

Here’s how to show a dataset:

$ chimps show my-dataset

this returns YAML by default but you can specify a different response format by passing the --response_format option

$ chimps show my-dataset --response_format=json

You’ve already seen create in action a few times so here’s update instead

$ chimps update my-dataset title="A new title"

And of course destroy

$ chimps destroy my-dataset

If you’re curious about the underlying HTTP requests being sent, try running these commands with the --verbose (-v) flag.

Special Requests

Chimps CLI has a few special commands which aren’t HTTP verbs or core REST actions.

Here’s how to search Infochimps for datasets about music:

$ chimps search music

Here’s the same search restricted to only datasets you own and pretty-printed:

$ chimps search --my music --pretty

Download

If a dataset on Infochimps has a downloadable package then the download command can be used to download the data:

$ chimps download daily-1970-2010-open-close-hi-low-and-volume-nyse-exchange

The dataset must be free, you must own it, or you must have purchased it (through the website) before you can download it with Chimps.

You may want to include the --verbose (-v) flag so that you can see the progress of the download, especially if it is a large file.

Upload

Infochimps does not presently allow you to upload data by using an API. Please create a dataset first (you can do this with Chimps) and then go to that dataset’s page in a browser and upload any data you wish.

This feature will be coming very, very soon!

Help/Test

chimps help and chimps help COMMAND should carry you a good ways with the examples and usage they output.

chimps test should confirm that your API keys are properly configured.

Contributing

Chimps CLI is an open source project created by the Infochimps team to encourage adoption of the Infochimps APIs. The official repository is hosted on GitHub

http://github.com/infochimps/chimps-cli

Feel free to clone it and send pull requests.