Class: GoogleBookSearch

Inherits:
Service show all
Includes:
MetadataHelper, UmlautHttp
Defined in:
app/service_adaptors/google_book_search.rb

Overview

Service that searches Google Book Search to determine viewability. It searches by ISBN, OCLCNUM and/or LCCN.

Uses Google Books API, code.google.com/apis/books/docs/v1/getting_started.html code.google.com/apis/books/docs/v1/using.html

If a full view is available it returns a fulltext service response. If partial view is available, return as “limited experts”. If no view at all, still includes a link in highlighted_links, to pay

lip service to google branding requirements.

Unfortunately there is no way tell which of the noview books provide search, although some do – search is advertised if full or partial view is available.

If a thumbnail_url is returned in the responses, a cover image is displayed.

Google API Key

Setting an api key in :api_key STRONGLY recommended, or you’ll probably get rate limited (not clear what the limit is with no api key supplied). You may have to ask for higher rate limit for your api key than the default 1000/day, which you can do through the google api console: code.google.com/apis/console

I requested 50k with this message, and was quickly approved with no questions “Services for academic library (Johns Hopkins Libraries) web applications to match Google Books availability to items presented by our catalog, OpenURL link resolver, and other software. ”

Recommend setting your ‘per user limit’ to something crazy high, as well as requesting more quota.

Constant Summary collapse

ViewFullValue =

Identifiers used in API response to indicate viewability level

'ALL_PAGES'
ViewPartialValue =
'PARTIAL'
ViewNoneValue =

None might also be ‘snippet’, but Google doesn’t want to distinguish

'NO_PAGES'
ViewUnknownValue =
'UNKNOWN'

Constants inherited from Service

Service::LinkOutFilterTask, Service::StandardTask

Instance Attribute Summary collapse

Attributes inherited from Service

#group, #name, #priority, #request, #service_id, #status, #task

Instance Method Summary collapse

Methods included from UmlautHttp

#http_fetch, #proxy_like_headers

Methods included from MetadataHelper

#get_doi, #get_epage, #get_gpo_item_nums, #get_identifier, #get_isbn, #get_issn, #get_lccn, #get_month, #get_oclcnum, #get_pmid, #get_search_creator, #get_search_terms, #get_search_title, #get_spage, #get_sudoc, #get_top_level_creator, #get_year, #normalize_lccn, #normalize_title, #raw_search_title, #title_is_serial?

Methods included from MarcHelper

#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd

Methods inherited from Service

#credits, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_to_view_data, #view_data_from_service_type

Constructor Details

#initialize(config) ⇒ GoogleBookSearch

Returns a new instance of GoogleBookSearch.



63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
# File 'app/service_adaptors/google_book_search.rb', line 63

def initialize(config)    
  @url = 'https://www.googleapis.com/books/v1/volumes?q='
  
  @display_name = 'Google Books'
  # number of full views to show
  @num_full_views = 1
  # default on, to enhance our metadata with stuff from google
  @referent_enhance = true
  # google api key strongly recommended, otherwise you'll
  # probably get rate limited. 
  @api_key = nil
  
  @credits = {
    "Google Books" => "http://books.google.com/"
  }
  # While you can theoretically look up by LCCN on Google Books,
  # we have found FREQUENT false positives. There's no longer any
  # way to even report these to Google. By default, don't lookup
  # by LCCN. 
  @lookup_by_lccn = false
  
  super(config)
end

Instance Attribute Details

#display_nameObject (readonly)

attr_reader is important for tests



50
51
52
# File 'app/service_adaptors/google_book_search.rb', line 50

def display_name
  @display_name
end

#num_full_viewsObject (readonly)

attr_reader is important for tests



50
51
52
# File 'app/service_adaptors/google_book_search.rb', line 50

def num_full_views
  @num_full_views
end

#urlObject (readonly)

attr_reader is important for tests



50
51
52
# File 'app/service_adaptors/google_book_search.rb', line 50

def url
  @url
end

Instance Method Details

#add_cover_image(request, url) ⇒ Object



392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
# File 'app/service_adaptors/google_book_search.rb', line 392

def add_cover_image(request, url)
  zoom_url = url.clone
  
  # if we're sent to a page other than the frontcover then strip out the
  # page number and insert front cover
  zoom_url.sub!(/&pg=.*?&/, '&printsec=frontcover&')
  
  # hack out the 'curl' if we can
  zoom_url.sub!('&edge=curl', '')
  
  request.add_service_response(
      :service=>self, 
      :display_text => 'Cover Image',
      :url => zoom_url, 
      :size => "medium",
      :service_type_value => :cover_image
  )     
end

#add_search_inside(request, data) ⇒ Object



321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
# File 'app/service_adaptors/google_book_search.rb', line 321

def add_search_inside(request, data)
  # Just take the first one we find, if multiple
  searchable_view = find_entries(data, [ViewFullValue, ViewPartialValue])[0]        
  
  if ( searchable_view )
    url = searchable_view["volumeInfo"]["infoLink"]
    
    request.add_service_response( 
      :service => self,
      :display_text=>@display_name,
      :url=> remove_query_context(url),
      :service_type_value => :search_inside
     )                  
  end
  
end

#build_headers(request) ⇒ Object

We don’t need to fake a proxy request anymore, but we still include X-Forwarded-For so google can return location-appropriate availability. If there’s an existing X-Forwarded-For, we respect it and add on to it.



246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
# File 'app/service_adaptors/google_book_search.rb', line 246

def build_headers(request)
  original_forwarded_for = nil
  if (request.http_env && request.http_env['HTTP_X_FORWARDED_FOR'])
    original_forwarded_for = request.http_env['HTTP_X_FORWARDED_FOR']                                  
  end

  # we used to prepare a comma seperated list in x-forwarded-for if
  # we had multiple requests, as per the x-forwarded-for spec, but I
  # think Google doesn't like it. 
  
  ip_address = (original_forwarded_for ?
      original_forwarded_for  :
      request.client_ip_addr.to_s)
  
  return {} if ip_address.blank?

  # If we've got a comma-seperated list from an X-Forwarded-For, we
  # can't send it on to google, google won't accept that, just take
  # the first one in the list, which is actually the ultimate client
  # IP. split returns the whole string if seperator isn't found, convenient.
  ip_address = ip_address.split(",").first
  
  # If all we have is an internal/private IP from the internal network,
  # do NOT send that to Google, or Google will give you a 503 error
  # and refuse to process your request, as of 7 sep 2011. sigh.
  # Also if it doesn't look like an IP at all, forget it, don't send it.     
  if ((! ip_address =~ /^\d+\.\d+\.\d+\/\d$/) || 
     ip_address.start_with?("10.") || 
     ip_address.start_with?("172.16") || 
     ip_address.start_with?("192.168"))
     return {}
  else    
    return {'X-Forwarded-For' => ip_address }
  end
end

#create_fulltext_service_response(request, data) ⇒ Object

We only create a fulltext service response if we have a full view. We create only as many full views as are specified in config.



298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
# File 'app/service_adaptors/google_book_search.rb', line 298

def create_fulltext_service_response(request, data)
  display_name = @display_name

  full_views = find_entries(data, ViewFullValue)
  return nil if full_views.empty?
  
  count = 0
  full_views.each do |fv|
    
    uri = fv["volumeInfo"]["previewLink"]
        
    request.add_service_response(
        :service => self, 
        :display_text => display_name, 
        :url => remove_query_context(uri),           
        :service_type_value =>  :fulltext  
    )
    count += 1
    break if count == @num_full_views
  end   
  return true
end

#do_query(bibkeys, request) ⇒ Object



208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
# File 'app/service_adaptors/google_book_search.rb', line 208

def do_query(bibkeys, request)    
  headers = build_headers(request)
  link = @url + bibkeys
  if @api_key
    link += "&key=#{@api_key}"
  end
  
  # Add on limit to only request books, not magazines. 
  link += "&printType=books"

  Rails.logger.debug("GoogleBookSearch requesting: #{link}")        
  response = http_fetch(link, :headers => headers, :raise_on_http_error_code => false)        
  data = MultiJson.load(response.body)
  
  # If Google gives us an error cause it says it can't geo-locate, 
  # remove the IP, log warning, and try again. 
  
  if (data["error"] && data["error"]["errors"] &&
      data["error"]["errors"].find {|h| h["reason"] == "unknownLocation"} )
    Rails.logger.warn("GoogleBookSearch: geo-locate error, retrying without X-Forwarded-For: '#{link}' headers: #{headers.inspect} #{response.inspect}\n    #{data.inspect}")
    
    response = http_fetch(link, :raise_on_http_error_code => false)        
    data = MultiJson.load(response.body)
      
  end
  
  
  if (! response.kind_of?(Net::HTTPSuccess)) || data["error"]      
    Rails.logger.error("GoogleBookSearch error: '#{link}' headers: #{headers.inspect} #{response.inspect}\n    #{data.inspect}")
  end
      
  return data
end

create highlighted_link service response for partial and noview Only show one web link. prefer a partial view over a noview. Some noviews have a snippet/search, but we have no way to tell.



341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
# File 'app/service_adaptors/google_book_search.rb', line 341

def do_web_links(request, data)

  # some noview items will have a snippet view, but we have no way to tell
  info_views = find_entries(data, ViewPartialValue)
  viewability = ViewPartialValue
  
  if info_views.blank?
    info_views = find_entries(data, ViewNoneValue)
    viewability = ViewNoneValue  
  end
  
  # Shouldn't ever get to this point, but just in case
  return nil if info_views.blank?
  
  url = ''
  iv = info_views.first
  type = nil
  if (viewability == ViewPartialValue && 
      url = iv["volumeInfo"]["previewLink"])
    display_text = @display_name
    type = ServiceTypeValue[:excerpts]
  else
    url = url = iv["volumeInfo"]["infoLink"]
    display_text = "Book Information"
    type = ServiceTypeValue[:highlighted_link]
  end
  request.add_service_response( 
      :service=>self,    
      :url=> remove_query_context(url),
      :display_text=>display_text,
      :service_type_value => type    
   )
end

#element_enhance(request, rft_key, value) ⇒ Object

Will not over-write existing referent values.



172
173
174
175
176
# File 'app/service_adaptors/google_book_search.rb', line 172

def element_enhance(request, rft_key, value)
  if (value)
    request.referent.enhance_referent(rft_key, value.to_s, true, false, :overwrite => false)
  end
end

#enhance_referent(request, data) ⇒ Object

Take the FIRST hit from google, and use it’s values to enhance our metadata. Will NOT overwrite existing data.



126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
# File 'app/service_adaptors/google_book_search.rb', line 126

def enhance_referent(request, data)
  
  entry = data["items"].first
  

  if (volumeInfo = entry["volumeInfo"])
    
    title = volumeInfo["title"]
    title += ": #{volumeInfo["subtitle"]}" if (title && volumeInfo["subtitle"])
    
    element_enhance(request, "title", title)
    element_enhance(request, "au", volumeInfo["authors"].first) if volumeInfo["authors"]
    element_enhance(request, "pub", volumeInfo["publisher"])
    
    element_enhance(request, "tpages", volumeInfo["pageCount"])
    
    if (date = volumeInfo["publishedDate"] && date =~ /^(\d\d\d\d)/)
      element_enhance(request, "date", $1)
    end
    
    # LCCN is only rarely included, but is sometimes, eg:
    # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"LCCN:72627172"}],          
    # Also "LCCN:76630875"
    #
    # And sometimes OCLC number like:
    # "industryIdentifiers"=>[{"type"=>"OTHER", "identifier"=>"OCLC:12345678"}],
    #        
    (volumeInfo["industryIdentifiers"] || []).each do |hash|
      
      if hash["type"] == "ISBN_13"
        element_enhance(request, "isbn", hash["identifier"])
        
      elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("LCCN:")
        lccn = normalize_lccn(  hash["identifier"].slice(5, hash["identifier"].length)  )
        request.referent.add_identifier("info:lccn/#{lccn}")
        
      elsif hash["type"] == "OTHER" && hash["identifier"].starts_with?("OCLC:")
        oclcnum = normalize_lccn(  hash["identifier"].slice(5, hash["identifier"].length)  )
        request.referent.add_identifier("info:oclcnum/#{oclcnum}")
      end
    
    end              
  end            
end

#find_entries(gbs_response, viewabilities) ⇒ Object



282
283
284
285
286
287
288
289
290
291
292
293
# File 'app/service_adaptors/google_book_search.rb', line 282

def find_entries(gbs_response, viewabilities)
  unless (viewabilities.kind_of?(Array))
    viewabilities = [viewabilities]
  end

  entries = gbs_response["items"].find_all do |entry|
    viewability = entry["accessInfo"]["viewability"]
    (viewability && viewabilities.include?(viewability))           
  end

  return entries
end

#find_thumbnail_url(data) ⇒ Object

Not all responses have a thumbnail_url. We look for them and return the 1st.



379
380
381
382
383
384
385
386
387
388
389
# File 'app/service_adaptors/google_book_search.rb', line 379

def find_thumbnail_url(data)
  entries = data["items"].collect do |entry|      
    entry["volumeInfo"]["imageLinks"]["thumbnail"] if entry["volumeInfo"] && entry["volumeInfo"]["imageLinks"]      
  end
  
  # removenill values
  entries.compact!    
  
  # pick the first of the available thumbnails, or nil
  return entries[0]
end

#get_bibkeys(rft) ⇒ Object

returns nil or escaped string of bibkeys to increase the chances of good hit, we send all available bibkeys and later dedupe by id. FIXME Assumes we only have one of each kind of identifier.



183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
# File 'app/service_adaptors/google_book_search.rb', line 183

def get_bibkeys(rft)
  isbn = get_identifier(:urn, "isbn", rft)
  oclcnum = get_identifier(:info, "oclcnum", rft)
  lccn = get_lccn(rft)

  # Google doesn't officially support oclc/lccn search, but does
  # index as token with prefix smashed up right with identifier
  # eg http://books.google.com/books/feeds/volumes?q=OCLC32012617
  #
  # Except turns out doing it as a phrase search is important! Or
  # google's normalization/tokenization does odd things. 
  keys = []
  keys << ('isbn:' + isbn) if isbn
  keys << ('"' + "OCLC" + oclcnum + '"') if oclcnum
  # Only use LCCN if we've got nothing else, and we're allowing it. 
  # it returns many false positives. 
  if @lookup_by_lccn && lccn && keys.length == 0
    keys << ('"' + 'LCCN' + lccn + '"')
  end
  
  return nil if keys.empty?
  keys = CGI.escape( keys.join(' OR ') )
  return keys
end

#handle(request) ⇒ Object



87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# File 'app/service_adaptors/google_book_search.rb', line 87

def handle(request)

  bibkeys = get_bibkeys(request.referent)
  return request.dispatched(self, true) if bibkeys.nil?

  data = do_query(bibkeys, request)
  
  
  if data.blank? || data["error"]
    # fail fatal
    return request.dispatched(self, false)
  end
  
  # 0 hits, return. 
  return request.dispatched(self, true) if data["totalItems"] == 0
  
  enhance_referent(request, data) if @referent_enhance
  
  #return full views first
  full_views_shown = create_fulltext_service_response(request, data)
  
  # Add search_inside link if appropriate
  add_search_inside(request, data)
  
  # only if no full view is shown, add links for partial view or noview
  unless full_views_shown
    do_web_links(request, data)
  end
  
  thumbnail_url = find_thumbnail_url(data)
  if thumbnail_url
    add_cover_image(request, thumbnail_url)    
  end

  return request.dispatched(self, true)
end

#remove_query_context(url) ⇒ Object

Google gives us URL to the book that contains a ‘dq’ param with the original query, which for us is an ISSN/LCCN/OCLCnum query, which we don’t actually want to leave in there.



414
415
416
# File 'app/service_adaptors/google_book_search.rb', line 414

def remove_query_context(url)
  url.sub(/&dq=[^&]+/, '')    
end

#response_url(service_response, submitted_params) ⇒ Object

Catch url_for call for search_inside, because we’re going to redirect



419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
# File 'app/service_adaptors/google_book_search.rb', line 419

def response_url(service_response, )
  if ( ! (service_response.service_type_value.name == "search_inside" ))
    return super(service_response, )
  else
    # search inside!
    base = service_response[:url]
    query = CGI.escape(["query"] || "")
    # attempting to reverse engineer a bit to get 'snippet'
    # style results instead of 'onepage' style results. 
    # snippet seem more user friendly, and are what google's own
    # interface seems to give you by default. but 'onepage' is the
    # default from our deep link, but if we copy the JS hash data,
    # it looks like we can get Google to 'snippet'.       
    url = base + "&q=#{query}#v=snippet&q=#{query}&f=false"
    return url
  end
end

#service_types_generatedObject



52
53
54
55
56
57
58
59
60
61
# File 'app/service_adaptors/google_book_search.rb', line 52

def service_types_generated
  types= [
    ServiceTypeValue[:fulltext], 
    ServiceTypeValue[:cover_image],
    ServiceTypeValue[:highlighted_link],
    ServiceTypeValue[:search_inside],
    ServiceTypeValue[:excerpts]]
  types.push(ServiceTypeValue[:referent_enhance]) if @referent_enhance
  return types
end