HBase output plugin for Fluent event collector
Overview
HBase output plugin buffers event logs in local file and puts it to HBase periodically.
Installation
Simply use RubyGems:
gem install fluent-plugin-hbase
Configuration
<match pattern>
type hbase
include_time_key TRUE_OR_FALSE
time_key KEY_IN_RECORD
include_tag_key TRUE_OR_FALSE
tag_key KEY_IN_RECORD
fields_to_columns_mapping MAPPING_FROM_JSON_FIELDS_TO_HBASE_COLUMNS
hbase_host YOUR_HBASE_HOST
hbase_port YOUR_HBASE_PORT
hbase_table YOUR_HBASE_TABLE_NAME
# DEPRECATED
# The below configuration keys are deprecated to
# stay along with fluentd's well-known configuration keys.
# Instead, use:
# - include_time_key
# - time_key
# - include_tag_key
# - tag_key
# - fields_to_columns_mapping
tag_column_name HBASE_COLUMN
time_column_name HBASE_COLUMN
</match>
- include_time_key (optional)
-
If you need to write each event’s time to HBase, set this true
and add a mapping from the ‘time_key’ to HBase column to the ‘fields_to_columns_mapping’ configuration.
The default is false.
- time_key (optional)
-
The key where each event’s time is kept for mapping to a HBase column.
You MUST add a mapping from the key configured here to a HBase column, to actually write times to HBase.
The default is “time”.
- include_tag_key (optional)
-
If you need to write each event’s tag to HBase, set this true
and add a mapping from the ‘tag_key’ to HBase column to the ‘fields_to_columns_mapping’ configuration.
The default is false.
- tag_key (optional)
-
The key where each event’s tag is kept for mapping to a HBase column.
You MUST add a mapping from the key configured here to a HBase column, to actually write tags to HBase.
The default is “tag”.
- tag_column_name (deprecated)
-
The HBase column to save the tag attached to each Fleuntd event log,
in the format “[Column family name]:[Column name]”. For example, to save tags to the column “c” in the column family “cf”, use “cf:c”.
- time_column_name (deprecated)
-
The HBase column to save the time each Fluentd event log was sent,
in the format “[Column family name]:[Colum name]”. For example, to save the time to the column “t” in the column family “cf”, use “cf:t”.
- fields_to_columns_mapping (required)
-
The mapping from JSON fields to HBase columns,
in the format “[JSON_FIELD1]=>,[JSON_FIELD2]=>,…”.
Each JSON_FIELD is formatted as field names separated by dot(.)s, e.g. “a.b.c”. Each HBASE_COLUMN is formatted as a tripled of a column family name, a colon(:), a column name, e.g. “cf:c”.
- hbase_host (required)
-
HBase host
- hbase_port (required)
-
HBase port
- hbase_table (required)
-
HBase table name
See example/fluent.conf for the configuration should work.
You can also test the configuration running a fluentd instance:
fluentd -c example/fluent.conf --plugin lib/fluent/plugin
Prerequiresites
You must setup your own Hadoop and HBase clusters and open appropriate ports to enable the plugin to access HBase via HBase Thrift Server.
The plugin is tested solely on the system with:
-
Hadoop 1.0.4
-
HBase 0.94.0
-
Java 1.6.0_37
-
Mac OS X 10.8 Mountain Lion
for now.
Please let me know if you find the plugin to work in any other environments.
Running
To make the plugin work, you need running instances of:
-
Hadoop
-
HBase
-
HBase Thrift Server w/ the compact (buffered) protocol (not the framed protocol)
The procedure may be:
-
Start Hadoop
$ start-all.sh
-
Start HBase
$ start-hbase.sh
-
Start HBase Thrift Server with the compact protocol
Use the thread pool server as it is the only server supports the compact protocol:
$ hbase thrift start -threadpool
-
Run Fluentd
Copyright
- Copyright
-
Copyright © 2012 FURYU CORPORATION
- License
-
Apache License, Version 2.0