Logstash Ouput Plugin for Vespa
Plugin for Logstash to write to Vespa. Apache 2.0 license.
Installation
Download and unpack/install Logstash, then:
bin/logstash-plugin install logstash-output-vespa_feed
Development
If you're developing the plugin, you'll want to do something like:
# build the gem
./gradlew gem
# install it as a Logstash plugin
/opt/logstash/bin/logstash-plugin install /path/to/logstash-output-vespa/logstash-output-vespa_feed-0.4.0.gem
# profit
/opt/logstash/bin/logstash
Some more good info about Logstash Java plugins can be found here.
It looks like the JVM options from here
are useful to make JRuby's bundle install
work.
Note to self: for some reason, bundle exec rake publish_gem
fails, but gem push logstash-output-vespa_feed-$VERSION.gem
does the trick.
Usage
Logstash config example:
# read stuff
input {
# if you want to just send stuff to a "message" field from the terminal
#stdin {}
file {
# let's assume we have some data in a CSV file here
path => "/path/to/data.csv"
# read the file from the beginning
start_position => "beginning"
# on Logstash restart, forget where we left off and start over again
sincedb_path => "/dev/null"
}
}
# parse and transform data here
filter {
csv {
# how does the CSV file look like?
separator => ","
quote_char => '"'
# if the first line is the header, we'll skip it
skip_header => true
# columns of the CSV file. Make sure you have these fields in the Vespa schema
columns => ["id", "description", ...]
}
# remove fields that we don't need. Here you can do a lot more processing
mutate {
remove_field => ["@timestamp", "@version", "event", "host", "log", "message"]
}
}
# publish to Vespa
output {
# for debugging. You can have multiple outputs (just as you can have multiple inputs/filters)
#stdout {}
vespa_feed { # including defaults here
# Vespa endpoint
vespa_url => "http://localhost:8080"
# for HTTPS URLS (e.g. Vespa Cloud), you may want to provide a certificate and key for mTLS authentication
client_cert => "/home/radu/vespa_apps/myapp/security/clients.pem"
# make sure the key isn't password-protected
# if it is, you can create a new key without a password like this:
# openssl rsa -in myapp_key_with_pass.pem -out myapp_key.pem
client_key => "/home/radu/vespa_apps/myapp_key.pem"
# namespace could be static or in the %{field} format, picking from a field in the document
namespace => "no_default_provide_yours"
# similarly, doc type could be static or in the %{field} format
document_type => "no_default_provide_yours_from_schema"
# take the document ID from this field in each row
# if the field doesn't exist, we generate a UUID
id_field => "id"
# how many HTTP/2 connections to keep open
max_connections => 1
# number of streams per connection
max_streams => 128
# request timeout (seconds) for each write operation
operation_timeout => 180
# after this time (seconds), the circuit breaker will be half-open:
# it will ping the endpoint to see if it's back,
# then resume sending requests when it's back
grace_period => 10
# how many times to retry on transient failures
max_retries => 10
}
}
Then you can start Logstash while pointing to the config file like:
bin/logstash -f logstash.conf