SnapSearch-Client-Ruby
Snapsearch Client Ruby is Ruby based framework agnostic HTTP client library for SnapSearch (https://snapsearch.io/).
SnapSearch provides similar libraries in other languages: https://github.com/SnapSearch/Snapsearch-Clients
Installation
SnapSearch-Client-Ruby is available through Rubygems and can be installed via:
gem install snapsearch-client-ruby
or add it to your Gemfile like this:
gem "snapsearch-client-ruby", "~> 0.1.0"
For all supported Ruby versions check out the .travis.yml
file.
Usage
SnapSearch Client Ruby is a rack based middleware for SnapSearch. It works with all rack based frameworks including Rails and Sinatra. You should place the SnapSearch middleware on top of other middleware so it gets called relatively early in the request response cycle. The middleware is also available as individual objects, which can be called independently. In non-rack based frameworks, it is best to start SnapSearch at the entry point of your application.
The examples folder in this repository contains a rack and sinatra example showing the context of using the SnapSearch middleware in your application. The below instructions is an abridged version of the examples.
For full documentation on the API and API request parameters see: https://snapsearch.io/documentation
Basic Usage
In your config.ru
file, import the rack/snap_search
, then setup the configuration:
require 'rack/snap_search'
use Rack::SnapSearch do |config|
config.email = '[email protected]'
config.key = 'API_KEY_HERE'
end
This will handle everything from the detection of the robot to outputting the cached snapshot. If it detects the robot, it will skip execution of the application and output the snapshot response. The default configuration is to output only the status, location headers and body content. This is because some headers may cause encoding errors.
Here is an example of the response hash from SnapSearch:
response = {
"cache" => true/false,
"callbackResult" => "",
"date" => 1390382314,
"headers" => [
{
"name" => "Content-Type",
"value" => "text/html"
}
],
"html" => "<html></html>",
"message" => "Success/Failed/Validation Errors",
"pageErrors" => [
{
"error" => "Error: document.querySelector(...) is null",
"trace" => [
{
"file" => "filename",
"function" => "anonymous",
"line" => "41",
"sourceURL" => "urltofile"
}
]
}
],
"screensot" => "BASE64 ENCODED IMAGE CONTENT",
"status" => 200
}
Advanced Usage
The rack based middleware has many options and if you use the objects independently they are even more flexible. These options can be seen in context in the examples folder:
use Rack::SnapSearch do |config|
config.email = '[email protected]'
config.key = 'API_KEY_HERE'
# Optional: The API URL to send requests to.
config.api_url = 'https://snapsearch.io/api/v1/robot' # Default
# Optional: The CA Cert file to use when sending HTTPS requests to the API.
config.ca_cert_file = SnapSearch.root.join('resources', 'cacert.pem') # Default
# Optional: Check X-Forwarded-Proto if you use a load balancer that proxies https to http connections
config.x_forwarded_proto = true # Default
# Optional: Extra API parameters that is sent to SnapSearch
config.parameters = {} # Default
# Optional: Whitelisted routes. Should be an Array of Regexp instances.
config.matched_routes = [] # Default
# Optional: Blacklisted routes. Should be an Array of Regexp instances.
config.ignored_routes = [] # Default
# Optional: A path of the JSON file containing the user agent whitelist & blacklist.
config.robots_json = SnapSearch.root.join('resources', 'robots.json') # Default
# Optional: A path to the JSON file containing a single Hash with the keys `ignore` and `match`. These keys contain Arrays of Strings (user agents)
config.extensions_json = SnapSearch.root.join('resources', 'extensions.json') # Default
# Optional: Set to `true` to check file extensions in the URL, this will check if the URL contains invalid file extensions.
#If there is no file extension, then there's no problem. But if there is, it could be a request to a static file. In which case it is not HTML that we want to intercept.
#It is typically easier to simply whitelist or blacklist file based routes.
#You do not need this unless your application server (not your HTTP server) is serving up static files. Like binary content, images and non-HTML text files.
config.check_file_extensions = false # Default
# Optional: A block to run when an exception occurs when making requests to the API.
config.on_exception do |exception|
p exception
end
# Optional: A block to run before the interception of a bot. You can use this to do client side caching.
config.before_intercept do |url|
#Get a client side cached snapshot
end
# Optional: A block to run after the interception of a bot. You can use this to do client side caching.
config.after_intercept do |url, response|
#Save the client side cached snapshot (the cached time should be less then the cached time you passed to SnapSearch, we recommend half the SnapSearch cachetime)
end
# Optional: A block to manipulate the response from the SnapSearch API if a bit is intercepted. The headers in this case represent [{name: "HEADERKEY", value: "HEADERVALUE"}, ...]
config.response_callback do |status, headers, body|
[ status, headers, body ]
end
end
Check out the resources folder containing the robots.json
and extensions.json
. The robots.json
contains all the Search Engine and Social App robot user agents we're currently checking for. The extensions.json
contains all the valid file extensions that a web application might use for HTML resources. Feel free to edit them and use your own JSON files for the middleware. Always make sure to ignore the "SnapSearch" robot, otherwise you could get into an infinite interception loop.
The Detector instance's robot and extensions hash are publicly accessible and can be modified during runtime.
# Add a user agent to match against:
detector.robots['match'] << 'NewRobot'
# Add a user agent to ignore:
detector.robots['ignore'] << 'MyRobot'
detector.extensions['ruby'] << 'myvalidrubyfileextensionforhtmlresources'
Development
Get the bundler dependency management tool.
gem install bundler
Install/update all dependencies:
bundle install
See all build tasks:
bundle exec rake -T
Make your changes. Release a new version tag with (see the other rake version:bump:... etc
tasks):
bundle exec rake version:bump
Synchronise and push the tag to Github:
git push
git push --
Create the gem package:
bundle exec rake gem
Push the gem to Ruby Gems:
gem push pkg/snapsearch-client-ruby-MAJOR.MINOR.PATCH.gem
Tests
Tests are written with RSpec. Run tests with bundle exec rspec spec/