Class: GoogleBookSearch
- Includes:
- MetadataHelper, UmlautHttp
- Defined in:
- app/service_adaptors/google_book_search.rb
Overview
Service that searches Google Book Search to determine viewability. It searches by ISBN, OCLCNUM and/or LCCN.
Uses Google Books API, code.google.com/apis/books/docs/v1/getting_started.html code.google.com/apis/books/docs/v1/using.html
If a full view is available it returns a fulltext service response. If partial view is available, return as “limited experts”. If no view at all, still includes a link in highlighted_links, to pay
lip service to google branding requirements.
Unfortunately there is no way tell which of the noview books provide search, although some do – search is advertised if full or partial view is available.
If a thumbnail_url is returned in the responses, a cover image is displayed.
Google API Key
Setting an api key in :api_key STRONGLY recommended, or you’ll probably get rate limited (not clear what the limit is with no api key supplied). You may have to ask for higher rate limit for your api key than the default 1000/day, which you can do through the google api console: code.google.com/apis/console
I requested 50k with this message, and was quickly approved with no questions “Services for academic library (Johns Hopkins Libraries) web applications to match Google Books availability to items presented by our catalog, OpenURL link resolver, and other software. ”
Recommend setting your ‘per user limit’ to something crazy high, as well as requesting more quota.
Constant Summary collapse
- ViewFullValue =
Identifiers used in API response to indicate viewability level
'ALL_PAGES'
- ViewPartialValue =
'PARTIAL'
- ViewNoneValue =
None might also be ‘snippet’, but Google doesn’t want to distinguish
'NO_PAGES'
- ViewUnknownValue =
'UNKNOWN'
Constants inherited from Service
Service::LinkOutFilterTask, Service::StandardTask
Instance Attribute Summary collapse
-
#display_name ⇒ Object
readonly
attr_reader is important for tests.
-
#num_full_views ⇒ Object
readonly
attr_reader is important for tests.
-
#url ⇒ Object
readonly
attr_reader is important for tests.
Attributes inherited from Service
#group, #name, #priority, #request, #service_id, #status, #task
Instance Method Summary collapse
- #add_cover_image(request, url) ⇒ Object
- #add_search_inside(request, data) ⇒ Object
-
#build_headers(request) ⇒ Object
We don’t need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability.
-
#create_fulltext_service_response(request, data) ⇒ Object
We only create a fulltext service response if we have a full view.
- #do_query(bibkeys, request) ⇒ Object
-
#do_web_links(request, data) ⇒ Object
create highlighted_link service response for partial and noview Only show one web link.
-
#element_enhance(request, rft_key, value) ⇒ Object
Will not over-write existing referent values.
-
#enhance_referent(request, data) ⇒ Object
Take the FIRST hit from google, and use it’s values to enhance our metadata.
- #find_entries(gbs_response, viewabilities) ⇒ Object
-
#find_thumbnail_url(data) ⇒ Object
Not all responses have a thumbnail_url.
-
#get_bibkeys(rft) ⇒ Object
returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id.
- #handle(request) ⇒ Object
-
#initialize(config) ⇒ GoogleBookSearch
constructor
A new instance of GoogleBookSearch.
-
#remove_query_context(url) ⇒ Object
Google gives us URL to the book that contains a ‘dq’ param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don’t actually want to leave in there.
-
#response_url(service_response, submitted_params) ⇒ Object
Catch url_for call for search_inside, because we’re going to redirect.
- #service_types_generated ⇒ Object
Methods included from UmlautHttp
#http_fetch, #proxy_like_headers
Methods included from MetadataHelper
#get_doi, #get_epage, #get_gpo_item_nums, #get_identifier, #get_isbn, #get_issn, #get_lccn, #get_month, #get_oclcnum, #get_pmid, #get_search_creator, #get_search_terms, #get_search_title, #get_spage, #get_sudoc, #get_top_level_creator, #get_year, #normalize_lccn, #normalize_title, #raw_search_title, #title_is_serial?
Methods included from MarcHelper
#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd
Methods inherited from Service
#credits, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_to_view_data, #view_data_from_service_type
Constructor Details
#initialize(config) ⇒ GoogleBookSearch
Returns a new instance of GoogleBookSearch.
63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 |
# File 'app/service_adaptors/google_book_search.rb', line 63 def initialize(config) @url = 'https://www.googleapis.com/books/v1/volumes?q=' @display_name = 'Google Books' # number of full views to show @num_full_views = 1 # default on, to enhance our metadata with stuff from google @referent_enhance = true # google api key strongly recommended, otherwise you'll # probably get rate limited. @api_key = nil @credits = { "Google Books" => "http://books.google.com/" } # While you can theoretically look up by LCCN on Google Books, # we have found FREQUENT false positives. There's no longer any # way to even report these to Google. By default, don't lookup # by LCCN. @lookup_by_lccn = false super(config) end |
Instance Attribute Details
#display_name ⇒ Object (readonly)
attr_reader is important for tests
50 51 52 |
# File 'app/service_adaptors/google_book_search.rb', line 50 def display_name @display_name end |
#num_full_views ⇒ Object (readonly)
attr_reader is important for tests
50 51 52 |
# File 'app/service_adaptors/google_book_search.rb', line 50 def num_full_views @num_full_views end |
#url ⇒ Object (readonly)
attr_reader is important for tests
50 51 52 |
# File 'app/service_adaptors/google_book_search.rb', line 50 def url @url end |
Instance Method Details
#add_cover_image(request, url) ⇒ Object
392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 |
# File 'app/service_adaptors/google_book_search.rb', line 392 def add_cover_image(request, url) zoom_url = url.clone # if we're sent to a page other than the frontcover then strip out the # page number and insert front cover zoom_url.sub!(/&pg=.*?&/, '&printsec=frontcover&') # hack out the 'curl' if we can zoom_url.sub!('&edge=curl', '') request.add_service_response( :service=>self, :display_text => 'Cover Image', :url => zoom_url, :size => "medium", :service_type_value => :cover_image ) end |
#add_search_inside(request, data) ⇒ Object
321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 |
# File 'app/service_adaptors/google_book_search.rb', line 321 def add_search_inside(request, data) # Just take the first one we find, if multiple searchable_view = find_entries(data, [ViewFullValue, ViewPartialValue])[0] if ( searchable_view ) url = searchable_view["volumeInfo"]["infoLink"] request.add_service_response( :service => self, :display_text=>@display_name, :url=> remove_query_context(url), :service_type_value => :search_inside ) end end |
#build_headers(request) ⇒ Object
We don’t need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability. If there’s an existing X-Forwarded-For, we respect it and add on to it.
246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 |
# File 'app/service_adaptors/google_book_search.rb', line 246 def build_headers(request) original_forwarded_for = nil if (request.http_env && request.http_env['HTTP_X_FORWARDED_FOR']) original_forwarded_for = request.http_env['HTTP_X_FORWARDED_FOR'] end # we used to prepare a comma seperated list in x-forwarded-for if # we had multiple requests, as per the x-forwarded-for spec, but I # think Google doesn't like it. ip_address = (original_forwarded_for ? original_forwarded_for : request.client_ip_addr.to_s) return {} if ip_address.blank? # If we've got a comma-seperated list from an X-Forwarded-For, we # can't send it on to google, google won't accept that, just take # the first one in the list, which is actually the ultimate client # IP. split returns the whole string if seperator isn't found, convenient. ip_address = ip_address.split(",").first # If all we have is an internal/private IP from the internal network, # do NOT send that to Google, or Google will give you a 503 error # and refuse to process your request, as of 7 sep 2011. sigh. # Also if it doesn't look like an IP at all, forget it, don't send it. if ((! ip_address =~ /^\d+\.\d+\.\d+\/\d$/) || ip_address.start_with?("10.") || ip_address.start_with?("172.16") || ip_address.start_with?("192.168")) return {} else return {'X-Forwarded-For' => ip_address } end end |
#create_fulltext_service_response(request, data) ⇒ Object
We only create a fulltext service response if we have a full view. We create only as many full views as are specified in config.
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 |
# File 'app/service_adaptors/google_book_search.rb', line 298 def create_fulltext_service_response(request, data) display_name = @display_name full_views = find_entries(data, ViewFullValue) return nil if full_views.empty? count = 0 full_views.each do |fv| uri = fv["volumeInfo"]["previewLink"] request.add_service_response( :service => self, :display_text => display_name, :url => remove_query_context(uri), :service_type_value => :fulltext ) count += 1 break if count == @num_full_views end return true end |
#do_query(bibkeys, request) ⇒ Object
208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 |
# File 'app/service_adaptors/google_book_search.rb', line 208 def do_query(bibkeys, request) headers = build_headers(request) link = @url + bibkeys if @api_key link += "&key=#{@api_key}" end # Add on limit to only request books, not magazines. link += "&printType=books" Rails.logger.debug("GoogleBookSearch requesting: #{link}") response = http_fetch(link, :headers => headers, :raise_on_http_error_code => false) data = MultiJson.load(response.body) # If Google gives us an error cause it says it can't geo-locate, # remove the IP, log warning, and try again. if (data["error"] && data["error"]["errors"] && data["error"]["errors"].find {|h| h["reason"] == "unknownLocation"} ) Rails.logger.warn("GoogleBookSearch: geo-locate error, retrying without X-Forwarded-For: '#{link}' headers: #{headers.inspect} #{response.inspect}\n #{data.inspect}") response = http_fetch(link, :raise_on_http_error_code => false) data = MultiJson.load(response.body) end if (! response.kind_of?(Net::HTTPSuccess)) || data["error"] Rails.logger.error("GoogleBookSearch error: '#{link}' headers: #{headers.inspect} #{response.inspect}\n #{data.inspect}") end return data end |
#do_web_links(request, data) ⇒ Object
create highlighted_link service response for partial and noview Only show one web link. prefer a partial view over a noview. Some noviews have a snippet/search, but we have no way to tell.
341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 |
# File 'app/service_adaptors/google_book_search.rb', line 341 def do_web_links(request, data) # some noview items will have a snippet view, but we have no way to tell info_views = find_entries(data, ViewPartialValue) viewability = ViewPartialValue if info_views.blank? info_views = find_entries(data, ViewNoneValue) viewability = ViewNoneValue end # Shouldn't ever get to this point, but just in case return nil if info_views.blank? url = '' iv = info_views.first type = nil if (viewability == ViewPartialValue && url = iv["volumeInfo"]["previewLink"]) display_text = @display_name type = ServiceTypeValue[:excerpts] else url = url = iv["volumeInfo"]["infoLink"] display_text = "Book Information" type = ServiceTypeValue[:highlighted_link] end request.add_service_response( :service=>self, :url=> remove_query_context(url), :display_text=>display_text, :service_type_value => type ) end |
#element_enhance(request, rft_key, value) ⇒ Object
Will not over-write existing referent values.
172 173 174 175 176 |
# File 'app/service_adaptors/google_book_search.rb', line 172 def element_enhance(request, rft_key, value) if (value) request.referent.enhance_referent(rft_key, value.to_s, true, false, :overwrite => false) end end |
#enhance_referent(request, data) ⇒ Object
Take the FIRST hit from google, and use it’s values to enhance our metadata. Will NOT overwrite existing data.
126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 |
# File 'app/service_adaptors/google_book_search.rb', line 126 def enhance_referent(request, data) entry = data["items"].first if (volumeInfo = entry["volumeInfo"]) title = volumeInfo["title"] title += ": #{volumeInfo["subtitle"]}" if (title && volumeInfo["subtitle"]) element_enhance(request, "title", title) element_enhance(request, "au", volumeInfo["authors"].first) if volumeInfo["authors"] element_enhance(request, "pub", volumeInfo["publisher"]) element_enhance(request, "tpages", volumeInfo["pageCount"]) if (date = volumeInfo["publishedDate"] && date =~ /^(\d\d\d\d)/) element_enhance(request, "date", $1) end # LCCN is only rarely included, but is sometimes, eg: # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"LCCN:72627172"}], # Also "LCCN:76630875" # # And sometimes OCLC number like: # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"OCLC:12345678"}], # (volumeInfo["industryIdentifiers"] || []).each do |hash| if hash["type"] == "ISBN_13" element_enhance(request, "isbn", hash["identifier"]) elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("LCCN:") lccn = normalize_lccn( hash["identifier"].slice(5, hash["identifier"].length) ) request.referent.add_identifier("info:lccn/#{lccn}") elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("OCLC:") oclcnum = normalize_lccn( hash["identifier"].slice(5, hash["identifier"].length) ) request.referent.add_identifier("info:oclcnum/#{oclcnum}") end end end end |
#find_entries(gbs_response, viewabilities) ⇒ Object
282 283 284 285 286 287 288 289 290 291 292 293 |
# File 'app/service_adaptors/google_book_search.rb', line 282 def find_entries(gbs_response, viewabilities) unless (viewabilities.kind_of?(Array)) viewabilities = [viewabilities] end entries = gbs_response["items"].find_all do |entry| viewability = entry["accessInfo"]["viewability"] (viewability && viewabilities.include?(viewability)) end return entries end |
#find_thumbnail_url(data) ⇒ Object
Not all responses have a thumbnail_url. We look for them and return the 1st.
379 380 381 382 383 384 385 386 387 388 389 |
# File 'app/service_adaptors/google_book_search.rb', line 379 def find_thumbnail_url(data) entries = data["items"].collect do |entry| entry["volumeInfo"]["imageLinks"]["thumbnail"] if entry["volumeInfo"] && entry["volumeInfo"]["imageLinks"] end # removenill values entries.compact! # pick the first of the available thumbnails, or nil return entries[0] end |
#get_bibkeys(rft) ⇒ Object
returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id. FIXME Assumes we only have one of each kind of identifier.
183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 |
# File 'app/service_adaptors/google_book_search.rb', line 183 def get_bibkeys(rft) isbn = get_identifier(:urn, "isbn", rft) oclcnum = get_identifier(:info, "oclcnum", rft) lccn = get_lccn(rft) # Google doesn't officially support oclc/lccn search, but does # index as token with prefix smashed up right with identifier # eg http://books.google.com/books/feeds/volumes?q=OCLC32012617 # # Except turns out doing it as a phrase search is important! Or # google's normalization/tokenization does odd things. keys = [] keys << ('isbn:' + isbn) if isbn keys << ('"' + "OCLC" + oclcnum + '"') if oclcnum # Only use LCCN if we've got nothing else, and we're allowing it. # it returns many false positives. if @lookup_by_lccn && lccn && keys.length == 0 keys << ('"' + 'LCCN' + lccn + '"') end return nil if keys.empty? keys = CGI.escape( keys.join(' OR ') ) return keys end |
#handle(request) ⇒ Object
87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 |
# File 'app/service_adaptors/google_book_search.rb', line 87 def handle(request) bibkeys = get_bibkeys(request.referent) return request.dispatched(self, true) if bibkeys.nil? data = do_query(bibkeys, request) if data.blank? || data["error"] # fail fatal return request.dispatched(self, false) end # 0 hits, return. return request.dispatched(self, true) if data["totalItems"] == 0 enhance_referent(request, data) if @referent_enhance #return full views first full_views_shown = create_fulltext_service_response(request, data) # Add search_inside link if appropriate add_search_inside(request, data) # only if no full view is shown, add links for partial view or noview unless full_views_shown do_web_links(request, data) end thumbnail_url = find_thumbnail_url(data) if thumbnail_url add_cover_image(request, thumbnail_url) end return request.dispatched(self, true) end |
#remove_query_context(url) ⇒ Object
Google gives us URL to the book that contains a ‘dq’ param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don’t actually want to leave in there.
414 415 416 |
# File 'app/service_adaptors/google_book_search.rb', line 414 def remove_query_context(url) url.sub(/&dq=[^&]+/, '') end |
#response_url(service_response, submitted_params) ⇒ Object
Catch url_for call for search_inside, because we’re going to redirect
419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 |
# File 'app/service_adaptors/google_book_search.rb', line 419 def response_url(service_response, submitted_params) if ( ! (service_response.service_type_value.name == "search_inside" )) return super(service_response, submitted_params) else # search inside! base = service_response[:url] query = CGI.escape(submitted_params["query"] || "") # attempting to reverse engineer a bit to get 'snippet' # style results instead of 'onepage' style results. # snippet seem more user friendly, and are what google's own # interface seems to give you by default. but 'onepage' is the # default from our deep link, but if we copy the JS hash data, # it looks like we can get Google to 'snippet'. url = base + "&q=#{query}#v=snippet&q=#{query}&f=false" return url end end |
#service_types_generated ⇒ Object
52 53 54 55 56 57 58 59 60 61 |
# File 'app/service_adaptors/google_book_search.rb', line 52 def service_types_generated types= [ ServiceTypeValue[:fulltext], ServiceTypeValue[:cover_image], ServiceTypeValue[:highlighted_link], ServiceTypeValue[:search_inside], ServiceTypeValue[:excerpts]] types.push(ServiceTypeValue[:referent_enhance]) if @referent_enhance return types end |