Wukong-Load
This Wukong plugin makes it easy to load data from the command-line into various data stores.
It is assumed that you will independently deploy and configure each
data store yourself (but see
Ironfan). Once you've
done that, and once you've written some dataflows with
Wukong, you can
load them into your data stores with wu-load
.
Wukong-Load is not intended for production use. It is meant as a tool to quickly load data into over the command-line, especially useful when developing flows in concert with wu-local.
Installation & Setup
Wukong-Load can be installed as a RubyGem:
$ sudo gem install wukong-load
Usage
Wukong-Load provides a command-line program wu-load
you can use to
load data fed in over STDIN. Get help on wu-load
by running
$ wu-load --help
and get help for a specific data store with
$ wu-load store_name --help
Further details will depend on the data store you're writing to.
Expected Input
All input to wu-load
should be newline-separated, JSON-formatted,
hash-like records. For some data stores, keys in the record may be
interpreted as metadata about the record or about how to route the
record within the data store.
Elasticsearch Usage
Lets you load JSON-formatted records into an Elasticsearch database. See full options with
$ wu-load elasticsearch --help
Connecting
wu-load
tries to connect to an Elasticsearch server at a default
host (localhost) and port (9200). You can change these:
$ cat data.json | wu-load elasticsearch --host=10.122.123.124 --port=80
All queries will be sent to this address.
Routing
Elasticsearch stores data in several indices which each contain documents of various types.
wu-load
loads each document into default index (wukong
) and type
(streaming_record
), but you can change these:
$ cat data.json | wu-load elasticsearch --host=10.123.123.123 --index=publication --es_type=book
A record with an _index
or _es_type
field will override these
default settings. You can change the names of the fields used.
Creates vs. Updates
If an input document contains a value for the field _id
then that
value will be as the ID of the record when written, possibly
overwriting a record that already exists -- an update.
You can change the field you use for the Elasticsearch ID property:
$ cat data.json | wu-load elasticsearch --host=10.123.123.123 --index=media --es_type=books --id_field="ISBN"
Kafka Usage
Lets you load JSON-formatted records into a Kafka queue. See full options with
$ wu-load kafka --help
Connecting
wu-load
tries to connect to a Kafka broker at a default host
(localhost) and a port (9092). You can change these:
$ cat data.json | wu-load kafka --host=10.122.123.124 --port=1234
All records will be sent to this address.
Routing
Kafka stores data in several named queues. Each queue can have several numbered partitions.
wu-load
loads each record into the default queue (test
) and
partition (0), but you can change these:
$ cat data.json | wu-load kafka --host=10.123.123.123 --topic=messages --partition=6
A record with a _topic
or _partition
field will override these
default settings. You can change the names of the fields used.
MongoDB Usage
Lets you load JSON-formatted records into an MongoDB database. See full options with
$ wu-load mongodb --help
Connecting
wu-load
tries to connect to an MongoDB server at a default host
(localhost) and port (27017). You can change these:
$ cat data.json | wu-load mongodb --host=10.122.123.124 --port=1234
All queries will be sent to this address.
Routing
MongoDB stores documents in several databases which each contain collections.
wu-load
loads each document into default database (wukong
) and
collection (streaming_record
), but you can change these:
$ cat data.json | wu-load mongodb --host=10.123.123.123 --database=publication --collection=book
A record with a _database
or _collection
field will override these
default settings. You can change the names of the fields used.
Creates vs. Updates
If an input document contains a value for the field _id
then that
value will be as the ID of the record when written, possibly
overwriting a record that already exists -- an update.
You can change the field you use for the MongoDB ID property:
$ cat data.json | wu-load mongodb --host=10.123.123.123 --database=media --collection=books --id_field="ISBN"