Class: GoogleBookSearch
- Includes:
- MetadataHelper, UmlautHttp
- Defined in:
- app/service_adaptors/google_book_search.rb
Overview
Service that searches Google Book Search to determine viewability. It searches by ISBN, OCLCNUM and/or LCCN.
Uses Google Books API, code.google.com/apis/books/docs/v1/getting_started.html code.google.com/apis/books/docs/v1/using.html
If a full view is available it returns a fulltext service response. If partial view is available, return as “limited experts”. If no view at all, still includes a link in highlighted_links, to pay
lip service to google branding requirements.
Unfortunately there is no way tell which of the noview books provide search, although some do – search is advertised if full or partial view is available.
If a thumbnail_url is returned in the responses, a cover image is displayed.
Can also enhances with an abstract, if available. – off by default, set ‘abstract: true` to turn on.
And fleshes out bibliographic details from an identifier – if all you had was an ISBN, will fill in title, author, etc in referent from GBS response.
Google API Key
Setting an api key in :api_key STRONGLY recommended, or you’ll probably get rate limited (not clear what the limit is with no api key supplied). You may have to ask for higher rate limit for your api key than the default 1000/day, which you can do through the google api console: code.google.com/apis/console
I requested 50k with this message, and was quickly approved with no questions “Services for academic library (Johns Hopkins Libraries) web applications to match Google Books availability to items presented by our catalog, OpenURL link resolver, and other software. ”
Recommend setting your ‘per user limit’ to something crazy high, as well as requesting more quota.
Constant Summary collapse
- ViewFullValue =
Identifiers used in API response to indicate viewability level
'ALL_PAGES'
- ViewPartialValue =
'PARTIAL'
- ViewNoneValue =
None might also be ‘snippet’, but Google doesn’t want to distinguish
'NO_PAGES'
- ViewUnknownValue =
'UNKNOWN'
Constants inherited from Service
Service::LinkOutFilterTask, Service::StandardTask
Instance Attribute Summary collapse
-
#display_name ⇒ Object
readonly
attr_reader is important for tests.
-
#num_full_views ⇒ Object
readonly
attr_reader is important for tests.
-
#url ⇒ Object
readonly
attr_reader is important for tests.
Attributes inherited from Service
#group, #name, #priority, #request, #service_id, #status, #task
Instance Method Summary collapse
- #add_abstract(request, data) ⇒ Object
- #add_cover_image(request, url) ⇒ Object
- #add_search_inside(request, data) ⇒ Object
-
#build_headers(request) ⇒ Object
We don’t need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability.
-
#create_fulltext_service_response(request, data) ⇒ Object
We only create a fulltext service response if we have a full view.
- #do_query(bibkeys, request) ⇒ Object
-
#do_web_links(request, data) ⇒ Object
create highlighted_link service response for partial and noview Only show one web link.
-
#element_enhance(request, rft_key, value) ⇒ Object
Will not over-write existing referent values.
-
#enhance_referent(request, data) ⇒ Object
Take the FIRST hit from google, and use it’s values to enhance our metadata.
- #find_entries(gbs_response, viewabilities) ⇒ Object
-
#find_thumbnail_url(data) ⇒ Object
Not all responses have a thumbnail_url.
-
#fix_pg_gbs_link(url) ⇒ Object
google books direct links do weird things with linking to internal pages, perhaps intending to be based on our search criteria, which pages matched, but we’re not using it like that for links to excerpts or full page.
-
#get_bibkeys(rft) ⇒ Object
returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id.
- #handle(request) ⇒ Object
-
#initialize(config) ⇒ GoogleBookSearch
constructor
A new instance of GoogleBookSearch.
-
#remove_query_context(url) ⇒ Object
Google gives us URL to the book that contains a ‘dq’ param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don’t actually want to leave in there.
-
#response_url(service_response, submitted_params) ⇒ Object
Catch url_for call for search_inside, because we’re going to redirect.
- #service_types_generated ⇒ Object
Methods included from UmlautHttp
#http_fetch, #proxy_like_headers
Methods included from MetadataHelper
#get_doi, #get_epage, #get_gpo_item_nums, #get_identifier, #get_isbn, #get_issn, #get_lccn, #get_month, #get_oclcnum, #get_pmid, #get_search_creator, #get_search_terms, #get_search_title, #get_spage, #get_sudoc, #get_top_level_creator, #get_year, #normalize_lccn, #normalize_title, #raw_search_title, title_is_serial?
Methods included from MarcHelper
#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd
Methods inherited from Service
#credits, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #translate
Constructor Details
#initialize(config) ⇒ GoogleBookSearch
Returns a new instance of GoogleBookSearch.
73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 |
# File 'app/service_adaptors/google_book_search.rb', line 73 def initialize(config) @url = 'https://www.googleapis.com/books/v1/volumes?q=' @display_name = 'Google Books' # number of full views to show @num_full_views = 1 # default on, to enhance our metadata with stuff from google @referent_enhance = true # default OFF, add description/abstract from GBS @abstract = false # Other responses on by default but can be turned off @cover_image = true @fulltext = true @search_inside = true @web_links = true # to partial view :excerpts or :fulltext # google api key strongly recommended, otherwise you'll # probably get rate limited. @api_key = nil @credits = { "Google Books" => "http://books.google.com/" } # While you can theoretically look up by LCCN on Google Books, # we have found FREQUENT false positives. There's no longer any # way to even report these to Google. By default, don't lookup # by LCCN. @lookup_by_lccn = false super(config) end |
Instance Attribute Details
#display_name ⇒ Object (readonly)
attr_reader is important for tests
55 56 57 |
# File 'app/service_adaptors/google_book_search.rb', line 55 def display_name @display_name end |
#num_full_views ⇒ Object (readonly)
attr_reader is important for tests
55 56 57 |
# File 'app/service_adaptors/google_book_search.rb', line 55 def num_full_views @num_full_views end |
#url ⇒ Object (readonly)
attr_reader is important for tests
55 56 57 |
# File 'app/service_adaptors/google_book_search.rb', line 55 def url @url end |
Instance Method Details
#add_abstract(request, data) ⇒ Object
201 202 203 204 205 206 207 208 209 210 211 212 213 214 |
# File 'app/service_adaptors/google_book_search.rb', line 201 def add_abstract(request, data) info = data["items"].first.try {|h| h["volumeInfo"]} if description = info["description"] url = info["infoLink"] request.add_service_response( :service => self, :display_text => "Description from Google Books", :display_text_i18n => "description", :url => remove_query_context(url), :service_type_value => :abstract ) end end |
#add_cover_image(request, url) ⇒ Object
452 453 454 455 456 457 458 459 460 461 462 463 464 465 466 467 468 469 |
# File 'app/service_adaptors/google_book_search.rb', line 452 def add_cover_image(request, url) zoom_url = url.clone # if we're sent to a page other than the frontcover then strip out the # page number and insert front cover zoom_url.sub!(/&pg=.*?&/, '&printsec=frontcover&') # hack out the 'curl' if we can zoom_url.sub!('&edge=curl', '') request.add_service_response( :service=>self, :display_text => 'Cover Image', :url => zoom_url, :size => "medium", :service_type_value => :cover_image ) end |
#add_search_inside(request, data) ⇒ Object
365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
# File 'app/service_adaptors/google_book_search.rb', line 365 def add_search_inside(request, data) # Just take the first one we find, if multiple searchable_view = find_entries(data, [ViewFullValue, ViewPartialValue])[0] if ( searchable_view ) url = searchable_view["volumeInfo"]["infoLink"] request.add_service_response( :service => self, :display_text=>@display_name, :display_text_i18n => "display_name", :url=> remove_query_context(url), :service_type_value => :search_inside ) end end |
#build_headers(request) ⇒ Object
We don’t need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability. If there’s an existing X-Forwarded-For, we respect it and add on to it.
291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 |
# File 'app/service_adaptors/google_book_search.rb', line 291 def build_headers(request) original_forwarded_for = nil if (request.http_env && request.http_env['HTTP_X_FORWARDED_FOR']) original_forwarded_for = request.http_env['HTTP_X_FORWARDED_FOR'] end # we used to prepare a comma seperated list in x-forwarded-for if # we had multiple requests, as per the x-forwarded-for spec, but I # think Google doesn't like it. ip_address = (original_forwarded_for ? original_forwarded_for : request.client_ip_addr.to_s) return {} if ip_address.blank? # If we've got a comma-seperated list from an X-Forwarded-For, we # can't send it on to google, google won't accept that, just take # the first one in the list, which is actually the ultimate client # IP. split returns the whole string if seperator isn't found, convenient. ip_address = ip_address.split(",").first # If all we have is an internal/private IP from the internal network, # do NOT send that to Google, or Google will give you a 503 error # and refuse to process your request, as of 7 sep 2011. sigh. # Also if it doesn't look like an IP at all, forget it, don't send it. if ((! ip_address =~ /^\d+\.\d+\.\d+\/\d$/) || ip_address.start_with?("10.") || ip_address.start_with?("172.16") || ip_address.start_with?("192.168")) return {} else return {'X-Forwarded-For' => ip_address } end end |
#create_fulltext_service_response(request, data) ⇒ Object
We only create a fulltext service response if we have a full view. We create only as many full views as are specified in config.
343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 |
# File 'app/service_adaptors/google_book_search.rb', line 343 def create_fulltext_service_response(request, data) full_views = find_entries(data, ViewFullValue) return nil if full_views.empty? count = 0 full_views.each do |fv| uri = fv["volumeInfo"]["previewLink"] request.add_service_response( :service => self, :display_text => @display_name, :display_text_i18n => "display_name", :url => remove_query_context(uri), :service_type_value => :fulltext ) count += 1 break if count == @num_full_views end return true end |
#do_query(bibkeys, request) ⇒ Object
253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 |
# File 'app/service_adaptors/google_book_search.rb', line 253 def do_query(bibkeys, request) headers = build_headers(request) link = @url + bibkeys if @api_key link += "&key=#{@api_key}" end # Add on limit to only request books, not magazines. link += "&printType=books" Rails.logger.debug("GoogleBookSearch requesting: #{link}") response = http_fetch(link, :headers => headers, :raise_on_http_error_code => false) data = MultiJson.load(response.body) # If Google gives us an error cause it says it can't geo-locate, # remove the IP, log warning, and try again. if (data["error"] && data["error"]["errors"] && data["error"]["errors"].find {|h| h["reason"] == "unknownLocation"} ) Rails.logger.warn("GoogleBookSearch: geo-locate error, retrying without X-Forwarded-For: '#{link}' headers: #{headers.inspect} #{response.inspect}\n #{data.inspect}") response = http_fetch(link, :raise_on_http_error_code => false) data = MultiJson.load(response.body) end if (! response.kind_of?(Net::HTTPSuccess)) || data["error"] Rails.logger.error("GoogleBookSearch error: '#{link}' headers: #{headers.inspect} #{response.inspect}\n #{data.inspect}") end return data end |
#do_web_links(request, data) ⇒ Object
create highlighted_link service response for partial and noview Only show one web link. prefer a partial view over a noview. Some noviews have a snippet/search, but we have no way to tell.
386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 |
# File 'app/service_adaptors/google_book_search.rb', line 386 def do_web_links(request, data) # some noview items will have a snippet view, but we have no way to tell info_views = find_entries(data, ViewPartialValue) viewability = ViewPartialValue if info_views.blank? info_views = find_entries(data, ViewNoneValue) viewability = ViewNoneValue end # Shouldn't ever get to this point, but just in case return nil if info_views.blank? url = '' iv = info_views.first type = nil if (viewability == ViewPartialValue && url = iv["volumeInfo"]["previewLink"]) url = fix_pg_gbs_link(url) display_text = @display_name display_text_i18n = "display_name" type = ServiceTypeValue[:excerpts] else url = iv["volumeInfo"]["infoLink"] url = fix_pg_gbs_link(url) display_text = "Book Information" display_text_i18n = "book_information" type = ServiceTypeValue[:highlighted_link] end request.add_service_response( :service=>self, :url=> remove_query_context(url), :display_text=>display_text, :display_text_i18n => display_text_i18n, :service_type_value => type ) end |
#element_enhance(request, rft_key, value) ⇒ Object
Will not over-write existing referent values.
217 218 219 220 221 |
# File 'app/service_adaptors/google_book_search.rb', line 217 def element_enhance(request, rft_key, value) if (value) request.referent.enhance_referent(rft_key, value.to_s, true, false, :overwrite => false) end end |
#enhance_referent(request, data) ⇒ Object
Take the FIRST hit from google, and use it’s values to enhance our metadata. Will NOT overwrite existing data.
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 |
# File 'app/service_adaptors/google_book_search.rb', line 156 def enhance_referent(request, data) entry = data["items"].first if (volumeInfo = entry["volumeInfo"]) title = volumeInfo["title"] title += ": #{volumeInfo["subtitle"]}" if (title && volumeInfo["subtitle"]) element_enhance(request, "title", title) element_enhance(request, "au", volumeInfo["authors"].first) if volumeInfo["authors"] element_enhance(request, "pub", volumeInfo["publisher"]) element_enhance(request, "tpages", volumeInfo["pageCount"]) if (date = volumeInfo["publishedDate"]) && date =~ /^(\d\d\d\d)/ element_enhance(request, "date", $1) end # LCCN is only rarely included, but is sometimes, eg: # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"LCCN:72627172"}], # Also "LCCN:76630875" # # And sometimes OCLC number like: # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"OCLC:12345678"}], # (volumeInfo["industryIdentifiers"] || []).each do |hash| if hash["type"] == "ISBN_13" element_enhance(request, "isbn", hash["identifier"]) elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("LCCN:") lccn = normalize_lccn( hash["identifier"].slice(5, hash["identifier"].length) ) request.referent.add_identifier("info:lccn/#{lccn}") elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("OCLC:") oclcnum = normalize_lccn( hash["identifier"].slice(5, hash["identifier"].length) ) request.referent.add_identifier("info:oclcnum/#{oclcnum}") end end end end |
#find_entries(gbs_response, viewabilities) ⇒ Object
327 328 329 330 331 332 333 334 335 336 337 338 |
# File 'app/service_adaptors/google_book_search.rb', line 327 def find_entries(gbs_response, viewabilities) unless (viewabilities.kind_of?(Array)) viewabilities = [viewabilities] end entries = gbs_response["items"].find_all do |entry| viewability = entry["accessInfo"]["viewability"] (viewability && viewabilities.include?(viewability)) end return entries end |
#find_thumbnail_url(data) ⇒ Object
Not all responses have a thumbnail_url. We look for them and return the 1st.
439 440 441 442 443 444 445 446 447 448 449 |
# File 'app/service_adaptors/google_book_search.rb', line 439 def find_thumbnail_url(data) entries = data["items"].collect do |entry| entry["volumeInfo"]["imageLinks"]["thumbnail"] if entry["volumeInfo"] && entry["volumeInfo"]["imageLinks"] end # removenill values entries.compact! # pick the first of the available thumbnails, or nil return entries[0] end |
#fix_pg_gbs_link(url) ⇒ Object
google books direct links do weird things with linking to internal pages, perhaps intending to be based on our search criteria, which pages matched, but we’re not using it like that for links to excerpts or full page. reverse engineer it to go to full page.
432 433 434 |
# File 'app/service_adaptors/google_book_search.rb', line 432 def fix_pg_gbs_link(url) url.sub(/([\?\;\&])(pg=[^;&]+)/, '\1pg=1') end |
#get_bibkeys(rft) ⇒ Object
returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id. FIXME Assumes we only have one of each kind of identifier.
228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 |
# File 'app/service_adaptors/google_book_search.rb', line 228 def get_bibkeys(rft) isbn = get_identifier(:urn, "isbn", rft) oclcnum = get_identifier(:info, "oclcnum", rft) lccn = get_lccn(rft) # Google doesn't officially support oclc/lccn search, but does # index as token with prefix smashed up right with identifier # eg http://books.google.com/books/feeds/volumes?q=OCLC32012617 # # Except turns out doing it as a phrase search is important! Or # google's normalization/tokenization does odd things. keys = [] keys << ('isbn:' + isbn) if isbn keys << ('"' + "OCLC" + oclcnum + '"') if oclcnum # Only use LCCN if we've got nothing else, and we're allowing it. # it returns many false positives. if @lookup_by_lccn && lccn && keys.length == 0 keys << ('"' + 'LCCN' + lccn + '"') end return nil if keys.empty? keys = CGI.escape( keys.join(' OR ') ) return keys end |
#handle(request) ⇒ Object
109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 |
# File 'app/service_adaptors/google_book_search.rb', line 109 def handle(request) bibkeys = get_bibkeys(request.referent) return request.dispatched(self, true) if bibkeys.nil? data = do_query(bibkeys, request) if data.blank? || data["error"] # fail fatal return request.dispatched(self, false) end # 0 hits, return. return request.dispatched(self, true) if data["totalItems"] == 0 enhance_referent(request, data) if @referent_enhance add_abstract(request, data) if @abstract #return full views first if @fulltext full_views_shown = create_fulltext_service_response(request, data) end if @search_inside # Add search_inside link if appropriate add_search_inside(request, data) end # only if no full view is shown, add links for partial view or noview unless full_views_shown do_web_links(request, data) end if @cover_image thumbnail_url = find_thumbnail_url(data) if thumbnail_url add_cover_image(request, thumbnail_url) end end return request.dispatched(self, true) end |
#remove_query_context(url) ⇒ Object
Google gives us URL to the book that contains a ‘dq’ param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don’t actually want to leave in there.
474 475 476 |
# File 'app/service_adaptors/google_book_search.rb', line 474 def remove_query_context(url) url.sub(/&dq=[^&]+/, '') end |
#response_url(service_response, submitted_params) ⇒ Object
Catch url_for call for search_inside, because we’re going to redirect
479 480 481 482 483 484 485 486 487 488 489 490 491 492 493 494 495 |
# File 'app/service_adaptors/google_book_search.rb', line 479 def response_url(service_response, submitted_params) if ( ! (service_response.service_type_value.name == "search_inside" )) return super(service_response, submitted_params) else # search inside! base = service_response[:url] query = CGI.escape(submitted_params["query"] || "") # attempting to reverse engineer a bit to get 'snippet' # style results instead of 'onepage' style results. # snippet seem more user friendly, and are what google's own # interface seems to give you by default. but 'onepage' is the # default from our deep link, but if we copy the JS hash data, # it looks like we can get Google to 'snippet'. url = base + "&q=#{query}#v=snippet&q=#{query}&f=false" return url end end |
#service_types_generated ⇒ Object
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 |
# File 'app/service_adaptors/google_book_search.rb', line 57 def service_types_generated types= [] if @web_links types.push ServiceTypeValue[:highlighted_link] types.push ServiceTypeValue[:excerpts] end types.push(ServiceTypeValue[:search_inside]) if @search_inside types.push(ServiceTypeValue[:fulltext]) if @fulltext types.push(ServiceTypeValue[:cover_image]) if @cover_image types.push(ServiceTypeValue[:referent_enhance]) if @referent_enhance types.push(ServiceTypeValue[:abstract]) if @abstract return types end |