Class: BentoSearch::GoogleSiteSearchEngine
- Inherits:
-
Object
- Object
- BentoSearch::GoogleSiteSearchEngine
- Extended by:
- HTTPClientPatch::IncludeClient
- Includes:
- SearchEngine
- Defined in:
- app/search_engines/bento_search/google_site_search_engine.rb
Overview
An adapter for Google Site Search/Google Custom Search
I think those are the same thing now, but may get differnet names depending on whether you are paying for getting for free. The free version only gives you 100 requests/day courtesy limit for testing.
Create a custom/site search: www.google.com/cse API docs: developers.google.com/custom-search/v1/overview API console to get API key? code.google.com/apis/console/?pli=1#project:183362013039
Limitations
-
per-page is max 10, which makes it not too too useful. If you ask for more, you’ll get an exception.
-
Google only lets you look at first 10 pages. If you ask for more, it won’t raise, it’ll just give you the last page google will let you have. pagintion object in result set will be appropriate for page you actually got though.
-
‘abstract’ field always filled out with relevant snippets from google api.
-
Google API supports custom ‘structured data’ in your web pages (from microdata and meta tags?) for custom sorting and limiting and maybe field searching – but this code does not currently support that. it could be added as custom config in some way.
-
The URL in display form is put in ResultItem#source_title That should result in it rendering in a reasonable place with standard display templates.
-
Sort: only relevance and date_desc. Custom sorts based on structured data not supported.
-
no search fields supported at present. may possibly add later after more investigation, google api may support both standard intitle etc, as well as custom attributes added in microdata to your pages.
-
ResultItem’s will be set to have no OpenURLs, since no useful ones can be constructed.
Required config params
- :api_key
-
api_key from google, get from Google API Console
- :cx
-
identifier for specific google CSE, get from “Search engine unique ID” in CSE “Control Panel”
Optional config params
- :highlighting
-
default false. if true, then title, display url, and snippets will have HTML <b> tags in them, and be html_safe. If false, plain ascii, but you’ll still get snippets.
Constant Summary
Constants included from SearchEngine
Class Method Summary collapse
Instance Method Summary collapse
-
#max_per_page ⇒ Object
yep, google gives us a 10 max per page.
- #search_implementation(args) ⇒ Object
-
#sort_definitions ⇒ Object
Google supports relevance, and date sorting.
Methods included from HTTPClientPatch::IncludeClient
Methods included from SearchEngine
#display_configuration, #engine_id, #fill_in_search_metadata_for, #initialize, #normalized_search_arguments, #public_settable_search_args, #search
Methods included from SearchEngine::Capabilities
#multi_field_search?, #search_field_definitions, #search_keys, #semantic_search_keys, #semantic_search_map, #sort_keys
Class Method Details
.default_configuration ⇒ Object
106 107 108 109 110 111 |
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 106 def self.default_configuration { :base_url => 'https://www.googleapis.com/customsearch/v1?', :highlighting => true } end |
.required_configuration ⇒ Object
102 103 104 |
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 102 def self.required_configuration [:api_key, :cx] end |
Instance Method Details
#max_per_page ⇒ Object
yep, google gives us a 10 max per page. also only lets us look at first 10 pages, sorry.
98 99 100 |
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 98 def max_per_page 10 end |
#search_implementation(args) ⇒ Object
52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 |
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 52 def search_implementation(args) results = BentoSearch::Results.new url = construct_query(args) response = http_client.get(url) if response.status != 200 results.error ||= {} results.error[:status] = response.status results.error[:response] = response.body return results end json = MultiJson.load(response.body) results.total_items = json["searchInformation"]["totalResults"].to_i (json["items"] || []).each do |json_item| item = BentoSearch::ResultItem.new if configuration.highlighting item.title = highlight_normalize json_item["htmlTitle"] item.abstract = highlight_normalize json_item["htmlSnippet"] item.source_title = highlight_normalize json_item["htmlFormattedUrl"] else item.title = json_item["title"] item.abstract = json_item["snippet"] item.source_title = json_item["formattedUrl"] end item.format_str = json_item["fileFormat"] item.link = json_item["link"] # we won't bother generating openurls for google hits, not useful item.openurl_disabled = true results << item end return results end |
#sort_definitions ⇒ Object
Google supports relevance, and date sorting. Other kinds of sorts not generally present. Can be with custom structured data, but we don’t support that. We currently do date sorts as hard sorts, but could be changed to be biases instead. See: developers.google.com/custom-search/docs/structured_data#page_dates
118 119 120 121 122 123 124 |
# File 'app/search_engines/bento_search/google_site_search_engine.rb', line 118 def sort_definitions { "relevance" => {}, "date_desc" => {:implementation => "date"}, "date_asc" => {:implementation => "date:a"} } end |