Class: BentoSearch::GoogleSiteSearchEngine

Inherits:
Object
  • Object
show all
Extended by:
HTTPClientPatch::IncludeClient
Includes:
SearchEngine
Defined in:
app/search_engines/bento_search/google_site_search_engine.rb

Overview

An adapter for Google Site Search/Google Custom Search

I think those are the same thing now, but may get differnet names depending on whether you are paying for getting for free. The free version only gives you 100 requests/day courtesy limit for testing.

Create a custom/site search: www.google.com/cse API docs: developers.google.com/custom-search/v1/overview API console to get API key? code.google.com/apis/console/?pli=1#project:183362013039

Limitations

  • per-page is max 10, which makes it not too too useful. If you ask for more, you’ll get an exception.

  • Google only lets you look at first 10 pages. If you ask for more, it won’t raise, it’ll just give you the last page google will let you have. pagintion object in result set will be appropriate for page you actually got though.

  • ‘abstract’ field always filled out with relevant snippets from google api.

  • Google API supports custom ‘structured data’ in your web pages (from microdata and meta tags?) for custom sorting and limiting and maybe field searching – but this code does not currently support that. it could be added as custom config in some way.

  • The URL in display form is put in ResultItem#source_title That should result in it rendering in a reasonable place with standard display templates.

  • Sort: only relevance and date_desc. Custom sorts based on structured data not supported.

  • no search fields supported at present. may possibly add later after more investigation, google api may support both standard intitle etc, as well as custom attributes added in microdata to your pages.

  • ResultItem’s will be set to have no OpenURLs, since no useful ones can be constructed.

Required config params

:api_key

api_key from google, get from Google API Console

:cx

identifier for specific google CSE, get from “Search engine unique ID” in CSE “Control Panel”

Optional config params

:highlighting

default false. if true, then title, display url, and snippets will have HTML <b> tags in them, and be html_safe. If false, plain ascii, but you’ll still get snippets.

Constant Summary

Constants included from SearchEngine

SearchEngine::DefaultPerPage

Class Method Summary collapse

Instance Method Summary collapse

Methods included from HTTPClientPatch::IncludeClient

include_http_client

Methods included from SearchEngine

#display_configuration, #engine_id, #fill_in_search_metadata_for, #initialize, #normalized_search_arguments, #public_settable_search_args, #search

Methods included from SearchEngine::Capabilities

#multi_field_search?, #search_field_definitions, #search_keys, #semantic_search_keys, #semantic_search_map, #sort_keys

Class Method Details

.default_configurationObject



106
107
108
109
110
111
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 106

def self.default_configuration
  {
    :base_url => 'https://www.googleapis.com/customsearch/v1?',
    :highlighting => true
  }
end

.required_configurationObject



102
103
104
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 102

def self.required_configuration
  [:api_key, :cx]
end

Instance Method Details

#max_per_pageObject

yep, google gives us a 10 max per page. also only lets us look at first 10 pages, sorry.



98
99
100
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 98

def max_per_page
  10
end

#search_implementation(args) ⇒ Object



52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 52

def search_implementation(args)
  results = BentoSearch::Results.new

  url = construct_query(args)

  response = http_client.get(url)

  if response.status != 200
    results.error ||= {}
    results.error[:status] = response.status
    results.error[:response] = response.body
    return results
  end

  json = MultiJson.load(response.body)

  results.total_items =  json["searchInformation"]["totalResults"].to_i

  (json["items"] || []).each do |json_item|
    item = BentoSearch::ResultItem.new

    if configuration.highlighting
      item.title          = highlight_normalize json_item["htmlTitle"]
      item.abstract       = highlight_normalize json_item["htmlSnippet"]
      item.source_title  = highlight_normalize json_item["htmlFormattedUrl"]
    else
      item.title          = json_item["title"]
      item.abstract       = json_item["snippet"]
      item.source_title   = json_item["formattedUrl"]
    end

    item.format_str       = json_item["fileFormat"]

    item.link             = json_item["link"]

    # we won't bother generating openurls for google hits, not useful
    item.openurl_disabled = true

    results << item
  end

  return results
end

#sort_definitionsObject

Google supports relevance, and date sorting. Other kinds of sorts not generally present. Can be with custom structured data, but we don’t support that. We currently do date sorts as hard sorts, but could be changed to be biases instead. See: developers.google.com/custom-search/docs/structured_data#page_dates



118
119
120
121
122
123
124
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 118

def sort_definitions
  {
    "relevance" => {},
    "date_desc" => {:implementation => "date"},
    "date_asc"  => {:implementation => "date:a"}
  }
end