Gem: google-site-search
Description
This gem was created to aid in the querying and parsing of the Google Site Search api.
In the simplest use case it will query your google site search for a term and supply you with an object containing the results. However I’ve built this gem with the intention that you will want to explicitly handle what and how the specific results are stored.
Installation
Add the following to your projects Gemfile.
gem 'google-site-search', :git => "[email protected]:dvallance/google-site-search.git"
Require the code if necessary (note: some frameworks like rails are set to auto-require gems for you by default)
require 'google-site-search'
Usage
The simpliest way to use the gem is by providing just a search query term and your search engine unique id code (e.g. looks like this 00255077836266642015
:u-scht7a-8i
and is located in your google site search control panel)
# just assign the query to an object
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")
# object has search attributes like
puts search.next_results_url
puts search.previous_results_url
puts search.xml
puts search.spelling
puts search.spelling_url
# object has an array of each specific result that contains title, description and its link by default
search.results.each do |result|
puts result.title
puts result.description
puts result.link
end
The query method expects a valid url so if you wanted to supply your own you can! However I have created a builder class to help with proper url creation and to help do some of the work for you.
Multiple Search
Since google only allows a max of 20 returned results I have added a method that will capture up to n number of results.
# the array will be up to 5 search objects if the query actually has that many results.
# has soon as a search doesn't have a next_results_url the method stops.
array_of_search_results = GoogleSiteSearch.query_multiple(5, *YOUR_URL*)
Caching
To aid in caching I added a a caching_key method which sorts the query parameters, removes some unique parameters that google adds (i.e. an ie parameter which has something to do with related searchs), and compresses the result. It should be a unique representation of a search url, and used to cache results.
Blocks (and a caching example)
Both the query and query_multiple methods can take a block which executes for each Search object found.
# here is a rails example using caching and supplying a block
url = GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i")
@search = Rails.cache.fetch(GoogleSiteSearch::caching_key(url.to_s), :expires => 1.day) do |search|
GoogleSiteSearch.query(url, SearchResult) do |search|
# I can do something with the search objects.
# possibly some custom analytics for our searchs?
search.url # we can access the object
end
end
Advanced Usage
An important requirement for this gem was to be able to use structured data for:
-
querying the search api itself (i.e. filtering and sorting )
-
displaying specific information in views (i.e. display a specific field like’author’, or ‘product_type’)
Therefore I allow the developer to supply his own “Results” class to the query and allow them to parse each result xml element explicitly.
The default Result class is as follows:
class Result
attr_reader :title, :link, :description
def initialize(node)
@title = node.find_first("T").content
@link = node.find_first("UE").content
@description = node.find_first("S").content
end
end
As you can see it is very simple. Your class simply needs an initialize method that will recieve an xml node, which it can then do with as it pleases. After it is initialized it is added to the search.results array as shown previously.
See
-
libxml-ruby for help on xml parsing.
-
Googles Site Search XML API reference for the <R> tag which encapsulates the details of an individual search result.
Pagination
The google search api does the work of pagination for us, supplying the next and previous urls. The urls are relative paths and contain the search engine id parameter. Since this is a security concern I strip out the search engine id when I store them in the Search.next_results_url and Search.previous_results_url methods. This makes them safe to put in links on views and it is why you must supply the search engine id again on the paginate call; so the full url can be rebuilt for the query call.
search2 = GoogleSiteSearch.query(GoogleSiteSearch.paginate(search1.next_results_url, "00255077836266642015:u-scht7a-8i"))
Pagination Simple Example
This works and is fairly straight forward.
In your controller:
if params[:move]
@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(params[:move], "00255077836266642015:u-scht7a-8i"))
else
@search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("microsoft", "00255077836266642015:u-scht7a-8i", :num => 5))
end
In your view:
<% if @search.previous_results_url %>
<%= link_to "Previous", search_url(:move => @search.previous_results_url) %>
<% end %>
<% if @search.next_results_url %>
<%= link_to "More", search_url(:move => @search.next_results_url) %>
<% end %>
Escaping
If you start passing around the url’s in parameters you may run into issues if you don’t escape/unescape the url. If so try…
View adds escape:
<%= link_to "Previous", search_url(:move => CGI::escape(@search.previous_results_url)) %>
Controller unescapes:
@search = GoogleSiteSearch.query(GoogleSiteSearch.paginate(CGI::unescape(params[:move]), "00255077836266642015:u-scht7a-8i"))
Filtering and Sorting
See Filtering and sorting search results.
Filtering
Google expects filtering to be on the “search query” itself. However I feel my end users won’t and shouldn’t be aware of all the possible filtering options (most of my filtering will be based off of dataobject values I supply myself). So I try and keep the filters and actual “search term” separate.
From the google reference link above an example filter search query is halloween more:pagemap:document-author:lisamorton
# using the example above would look like this.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter => "more:pagemap:document-author:lisamorton")
Separate Search Term From Filters
The full “search query” is returned by google’s api and stored in the Search object in a few spots. (i.e @search.search_query method and @search.spelling_q).
To separate the search term from the filter use:
search_term, filters = GoogleSiteSearch.separate_search_term_from_filters(@search.search_query)
Sorting
Sorting would also be done by specifing a sort option.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween", "00255077836266642015:u-scht7a-8i", :filter => "more:pagemap:document-author:lisamorton", :sort => "data-sdate")
Other Params
Any [param=value] query string additions you want to add can be assigned like the sorting above. For example to limit the search results return, to 5, would look like…
# get only 5 search results with the filtering and sorting from above still applyed.
search = GoogleSiteSearch.query(GoogleSiteSearch::UrlBuilder.new("halloween more:pagemap:document-author:lisamorton", "00255077836266642015:u-scht7a-8i", :sort => "date-sdate", :num => "5" )
Author
David Vallance