Class: OpenLibrary

Inherits:
Service show all
Includes:
MetadataHelper
Defined in:
lib/service_adaptors/open_library.rb

Overview

EXPERIMENTAL, uncomplete. Searches Open Library for fulltext, and cover images. To some extent duplicates what the InternetArchive service does, but using the OpenLibrary API.

This service right now will only search on isbn/oclcnum/lccn identifiers, not on title/author keyword.

Only a subset of OL cover images are actually available via API (those submitted by users). Here is an example: ?rft.isbn=0921307802 Size of images returned is unpredictable. They can be huge sometimes. Counting on enforced re-sizing in img tag attributes.

Constant Summary

Constants inherited from Service

Service::LinkOutFilterTask, Service::StandardTask

Instance Attribute Summary collapse

Attributes inherited from Service

#name, #priority, #request, #service_id, #session_id, #status, #task

Instance Method Summary collapse

Methods included from MetadataHelper

#get_doi, #get_gpo_item_nums, #get_identifier, #get_isbn, #get_issn, #get_lccn, #get_oclcnum, #get_pmid, #get_search_creator, #get_search_terms, #get_search_title, #get_sudoc, #get_top_level_creator, #get_year, #normalize_lccn, #normalize_title, #raw_search_title, #title_is_serial?

Methods included from MarcHelper

#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd

Methods inherited from Service

#credits, #display_name, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_to_view_data, #response_url, #view_data_from_service_type

Constructor Details

#initialize(config) ⇒ OpenLibrary

Returns a new instance of OpenLibrary.



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# File 'lib/service_adaptors/open_library.rb', line 38

def initialize(config)
  @api_url = "http://openlibrary.org/api"
  @display_name = "Open Library"
  # in case the structure of an empty response changes 
  @empty_response = {"result" => [], "status" => "ok"}
  @num_full_views = 1

  # Can turn on and off each type of service
  @get_fulltext = true
  @get_covers = true
  @enhance_metadata = true
  
  # openlibrary goes straight to the flipbook; archive.org to main page
  @fulltext_base_url = 'http://archive.org/details' #'http://openlibrary.org/details'
  @download_link = true
  
  @credits = {
    "OpenLibrary" => "http://openlibrary.org/"
  }
  
  super(config)
end

Instance Attribute Details

#urlObject (readonly)

Returns the value of attribute url.



21
22
23
# File 'lib/service_adaptors/open_library.rb', line 21

def url
  @url
end

Instance Method Details

#add_cover_image(request, editions) ⇒ Object



237
238
239
240
241
242
243
244
245
246
247
248
249
250
# File 'lib/service_adaptors/open_library.rb', line 237

def add_cover_image(request, editions)
  cover_image = find_coverimages(editions)
  return nil if cover_image.blank?
  #FIXME need to add other sizes
  #FIXME correct @urls and use one of those
  url = "http://openlibrary.org" + cover_image
  request.add_service_response(
        :service=>self, 
        :display_text => 'Cover Image',
        :key=> 'medium', 
        :url => url, 
        :size => 'medium',
        :service_type_value => :cover_image)
end

#bytes_to_mb(bytes) ⇒ Object



233
234
235
# File 'lib/service_adaptors/open_library.rb', line 233

def bytes_to_mb(bytes)
  bytes / (1024.0 * 1024.0)
end

TODO: If first one doesn’t have a download, try second? In general, we need a better way of grouping ALL the results available for the user. Creates a highlighted_link for download of PDF for first edition listed.



186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
# File 'lib/service_adaptors/open_library.rb', line 186

def create_download_link(request, editions)
  return nil unless editions
  ed = editions[0] if editions.length
  return nil unless ed['ocaid']
  server = "www.archive.org"
  pdf = "/download/"<< ed['ocaid'] << "/" << 
    ed['ocaid'] << ".pdf"
  url = "http://" << server << pdf
  
  bytes = determine_download_size(server, pdf)
  return nil if bytes.nil? || bytes == 0
  
  note = bytes_to_mb(bytes)

  
  request.add_service_response(
        :service=>self, 
        :display_text=>"Download: " << ed['title'], 
        :url=>url, 
        :notes=> ("%.1f" %  note) + " MB",
        :service_type_value => :highlighted_link ) 
end

#create_fulltext_service_responses(request, editions) ⇒ Object



163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
# File 'lib/service_adaptors/open_library.rb', line 163

def create_fulltext_service_responses(request, editions)
  count = 0
  #note = @note
  editions.each do |ed|
    title = ed['title']
    url = @fulltext_base_url + '/' +ed['ocaid']
    request.add_service_response(
        :service=>self, 
        :display_text=>@display_name, 
        :url=>url, 
        :notes=>title, 
        :service_type_value =>  :fulltext ) 
    
    count += 1
    break if count == @num_full_views
  end  
end

#determine_download_size(server, pdf) ⇒ Object

they redirect so we actually have to do two HEAD requests to get the actual content length. Returns bytes as int.



211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
# File 'lib/service_adaptors/open_library.rb', line 211

def determine_download_size(server, pdf)
  real_location = ''
  Net::HTTP.start(server, 80) do |http|
    # Send a HEAD request
    response = http.head(pdf)      
    # Get the real location
    real_location = response['Location']
  end    
  m = real_location.match(/http:\/\/(.*?)(\/.*)/)
  real_server = m[1]
  real_pdf = m[2]
  Net::HTTP.start(real_server, 80) do |http|
    # Send a HEAD request
    resp = http.head(real_pdf)

    return nil if resp.kind_of?(Net::HTTPServerError) || resp.kind_of?(Net::HTTPClientError) 
    
    bytes = resp['Content-Length'].to_i
    return bytes
  end
end

#do_id_query(ids) ⇒ Object

only returns the unique keys from all the results



110
111
112
113
114
115
116
117
118
119
# File 'lib/service_adaptors/open_library.rb', line 110

def do_id_query(ids)
  responses = []
  ids.each do |k, v|
    new_key_value = map_key(k, v)
    next if new_key_value.blank? #we probably have bad ISBN, could be bad key though
    responses <<  get_thing(new_key_value)
  end
  selected = responses.map { |r| r['result'] }.flatten.compact.uniq
  return selected
end

#enhance_metadata(referent, editions) ⇒ Object



260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
# File 'lib/service_adaptors/open_library.rb', line 260

def (referent, editions)
  # Which one should we use to enhance? Whichever has the largest
  # oclcnum, or if none of them have an oclcnum, then whichever
  # has the most metadata elements. 
  winner = nil
  winner_oclcnum = 0
  winner_numfields = 0
  editions.each do |e|
    score = (e)
    if ( ( score[:oclcnum] && score[:oclcnum] > winner_oclcnum ) ||
         ( winner_oclcnum == 0 && score[:numfields] > winner_numfields)) 
         winner = e
         winner_oclcnum = score[:oclcnum] if score[:oclcnum]
         winner_numfields = score[:numfields]
    end
  end

  if (winner)
    referent.enhance_referent("title", winner["title"], true, false, {:overwrite=>false}) unless winner["title"].blank?
    
    referent.enhance_referent("pub", winner["publishers"].join(","), true, false, {:overwrite=>false}) unless winner["publishers"].blank?
    
    referent.enhance_referent("date", winner["publish_date"], true, false, {:overwrite=>false}) if winner["publish_date"] =~ /^\d\d\d\d$/
    
    referent.enhance_referent("pub", winner["publish_places"].join(","), true, false, {:overwrite=>false}) unless winner["publish_places"].blank?
    
    referent.enhance_referent("lccn", winner["lccn"][0], true, false, {:overwrite=>false}) unless winner["lccn"].blank?

    # ISBN, prefer 13 if possible
    referent.enhance_referent("isbn", winner["isbn_13"][0], true, false, {:overwrite=>false}) unless winner["isbn_13"].blank?
    
    referent.enhance_referent("isbn", winner["isbn_10"][0], true, false, {:overwrite=>false}) if winner["isbn_13"].blank? && ! winner["isbn_10"].blank?

    referent.enhance_referent("oclcnum", winner["oclc_numbers"][0], true, false, {:overwrite=>false}) unless winner["oclc_numbers"].blank?
    
  end    
    
end

#find_coverimages(editions) ⇒ Object

pick the first of the coverimages found



253
254
255
256
257
258
# File 'lib/service_adaptors/open_library.rb', line 253

def find_coverimages(editions)
  images = editions.map{|ed| ed['coverimage']}.compact
  # filter out fake ones
  images.reject! { |url| url =~ /book\.trans\.gif$/ }
  return images[0]
end

#get_data(request) ⇒ Object



66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# File 'lib/service_adaptors/open_library.rb', line 66

def get_data(request)
  ids = get_identifiers(request.referent)
  return nil if ids.blank?
  ol_keys = do_id_query(ids)    
  return nil if ol_keys.blank?
  
  editions = get_editions(ol_keys)
  return nil if editions.blank?

  (request.referent, editions) if @enhance_metadata
  
  add_cover_image(request, editions) if @get_cover_image

  if ( @get_fulltext)
    full_text_editions = select_fulltext(editions)
    unless full_text_editions.blank?
      create_fulltext_service_responses(request, full_text_editions)
      create_download_link(request, full_text_editions) if @download_link
    end
  end
  
  # Open Libary metadata looks messy right now and incomplete
  # if there is only one edition returned then we return a highlighted link
  # otherwise best to just leave it off
  if editions.length == 1
    # FIXME add this method
    #create_highlighted_link(request, editions)
  end

end

#get_editions(ol_keys) ⇒ Object

Contacts OL and gets data records for editions/manifestations matching any of keys we have.



130
131
132
133
134
135
136
137
138
# File 'lib/service_adaptors/open_library.rb', line 130

def get_editions(ol_keys)
  editions = []
  ol_keys.each do |k|
    link = @api_url + "/get?key=" + k
    resp = open(link).read
    editions << JSON.parse(resp)['result']
  end
  return editions
end

#get_identifiers(rft) ⇒ Object



97
98
99
100
101
102
103
104
105
106
107
# File 'lib/service_adaptors/open_library.rb', line 97

def get_identifiers(rft)
  isbn = get_identifier(:urn, "isbn", rft)
  oclcnum = get_identifier(:info, "oclcnum", rft)
  lccn = get_identifier(:info, "lccn", rft)
  
  h = {}
  h['isbn'] = isbn unless isbn.blank?
  h['oclcnum'] = oclcnum unless oclcnum.blank?
  h['lccn'] = lccn unless lccn.blank?
  return h
end

#get_thing(query_hash) ⇒ Object

given a hash as a query it returns a hash



122
123
124
125
126
# File 'lib/service_adaptors/open_library.rb', line 122

def get_thing(query_hash)
  query = {"type" => "/type/edition"}.merge(query_hash)
  response = open(@api_url + "/things?query=" + CGI.escape(query.to_json) ).read
  JSON.parse(response)
end

#handle(request) ⇒ Object



61
62
63
64
# File 'lib/service_adaptors/open_library.rb', line 61

def handle(request)
  get_data(request)    
  return request.dispatched(self,true)
end

#map_key(k, v) ⇒ Object



140
141
142
143
144
145
146
147
148
149
150
151
152
# File 'lib/service_adaptors/open_library.rb', line 140

def map_key(k, v)
  new_key = case k
  when "lccn" then "lccn"
  when "oclcnum" then "oclc_numbers"
  when "isbn"
    if v.length == 10
      "isbn_10"
    elsif v.length == 13
      "isbn_13"
    end
  end
  return { new_key => v}
end

#score_metadata(edition) ⇒ Object

Score an edition in terms of how good it’s metadata is. Returns a two-element array, first element is OCLCnum (or nil), second element is number of complete metadata elements. We like an OCLCnum, especially a higher one, and we like more elements.



304
305
306
307
308
309
310
311
312
313
314
# File 'lib/service_adaptors/open_library.rb', line 304

def (edition)
  oclcnum = edition["oclc_numbers"].collect {|i| i.to_i}.max unless edition["oclc_numbers"].blank?
  oclcnum = nil if oclcnum == 0

  score = 0
  ["title", "publish_places", "publishers", "publish_date", "isbn_10", "isbn_13", "lccn"].each do |key|
    score = score + 1 unless edition[key].blank?
  end

  return {:oclcnum => oclcnum, :numfields => score}
end

#select_fulltext(editions) ⇒ Object

right now we only know of a work having fulltext if it has an ocaid in case we discover other ways to determine fulltext availability we

move it to its own method



157
158
159
160
161
# File 'lib/service_adaptors/open_library.rb', line 157

def select_fulltext(editions)
  editions.select do |ed|
    ! ed['ocaid'].blank?
  end
end

#service_types_generatedObject



23
24
25
26
27
28
29
30
31
32
33
34
35
36
# File 'lib/service_adaptors/open_library.rb', line 23

def service_types_generated
  
  types = Array.new
  types.push( ServiceTypeValue[:fulltext]) if @get_fulltext
  types.push( ServiceTypeValue[:highlighted_link]) if @get_fulltext
  types.push( ServiceTypeValue[:cover_image]) if @get_covers 

  return types
  
  # FIXME add these service types
  #ServiceTypeValue[:table_of_contents]
  #ServiceTypeValue[:search_inside]

end