Class: Sfx

Inherits:
Service show all
Defined in:
lib/service_adaptors/sfx.rb

Overview

NOTE: In your SFX Admin, under Menu Configuration / API, you should enable ALL ‘extra’ API information for full umlaut functionality. With the exception of “Include openURL parameter”, can’t figure out how that’s useful.

config parameters in services.yml display name: User displayable name for this service base_url: SFX base url. click_passthrough: DEPRECATED. Caused problems. Use the SFXBackchannelRecord

link filter service instead. 
When set to true, Umlaut will send all SFX clicks
through SFX, for SFX to capture statistics. This is currently done
using a backdoor into the SFX sfxresolve.cgi script. Defaults to false. 
Note that
after SFX requests have been removed in the nightly job, the
click passthrough will cause an error! Set sfx_requests_expire_crontab
with the crontab pattern you use for requests to expire, and we won't
try to click passthrough with expired requests.

sfx_requests_expire_crontab: Crontab pattern that the SFX admin is using

to expire SFX requests. Used to refrain from click passthrough with
expired requests, since that is destined to fail.

services_of_interest: Optional. over-ride the built in list of what types of

SFX services we want to grab, and what the corresponding umlaut types are.
hash, with SFX service type name as key, and Umlaut ServiceTypeValue
name as value.

extra_targets_of_interest: sfx target_names of targets you want to make

sure to include in umlaut. A hash with target_name as key, and umlaut
ResponseTypeValue name as value.

really_distant_relationships: An array of relationship type codes from SFX

"related objects".  See Table 18 in SFX 3.0 User's Manual. Related
objects that have only a "really distant relationship" will NOT
be shown as fulltext, but will instead be banished to the see also
"highlighted_link" section. You must have display of related objects
turned ON in SFX display admin, to get related objects at all in
Umlaut. NOTE: This parameter has a default value to a certain set of
relationships, set to empty array [] to eliminate defaults.

sfx_timeout: in seconds, for both open/read timeout value for SFX connection.

Defaults to 8.

Constant Summary

Constants inherited from Service

Service::LinkOutFilterTask, Service::StandardTask

Instance Attribute Summary

Attributes inherited from Service

#name, #priority, #request, #service_id, #session_id, #status, #task, #url

Class Method Summary collapse

Instance Method Summary collapse

Methods inherited from Service

#credits, #display_name, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_to_view_data, #view_data_from_service_type

Constructor Details

#initialize(config) ⇒ Sfx

Returns a new instance of Sfx.



49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
# File 'lib/service_adaptors/sfx.rb', line 49

def initialize(config)

  # Key is sfx service_type, value is umlaut servicetype string.
  # These are the SFX service types we will translate to umlaut
  @services_of_interest = {'getFullTxt'          => 'fulltext',
                           'getSelectedFullTxt'  => 'fulltext',
                           'getDocumentDelivery' => 'document_delivery',                         
                           'getDOI'              => 'highlighted_link',
                           'getAbstract'         => 'abstract',
                           'getTOC'              => 'table_of_contents'}

  # Special targets. Key is SFX target_name.
  # Value is umlaut service type.
  # These targets will be included even if their sfx service_type doesn't
  # match our services_of_interest, and the umlaut service ID string given
  # here will take precedence and be used even if these targets DO match
  # services_of_interest. Generally loaded from yml config in super.    
  @extra_targets_of_interest = {}

  @sfx_timeout = 8

  @really_distant_relationships = ["CONTINUES_IN_PART", "CONTINUED_IN_PART_BY", "ABSORBED_IN_PART", "ABSORBED_BY"]
  
  # Include a CrossRef credit, becuase SFX uses CrossRef api internally,
  # and CrossRef ToS may require us to give credit. 
  @credits = {
    "SFX" => "http://www.exlibrisgroup.com/sfx.htm",
    "CrossRef" => "http://www.crossref.org/"
  }
                                
  super(config)                              
end

Class Method Details

.parse_perl_data(doc) ⇒ Object

Class method to parse a perl_data block as XML in String into a ContextObject. Argument is Nokogiri doc containing the SFX <perldata> element and children.



448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
# File 'lib/service_adaptors/sfx.rb', line 448

def self.parse_perl_data(doc)        

  co = OpenURL::ContextObject.new
  co.referent.set_format('journal') # default

  html_ent_coder = HTMLEntities.new 
  
  doc.search('perldata/hash/item').each do |item|
    key = item['key'].to_s
          
    value = item.inner_text
    
    # SFX sometimes returns invalid UTF8 (is it really ISO 8859? Is it
    # predictable? Who knows. If it's not valid, it'll cause all
    # sorts of problems later. So if it's not valid, we're just
    # going to ignore it, sorry. 
    next unless value.valid_encoding?

    # Some normalization. SFX uses rft.year, which is not actually
    # legal. Stick it in rft.date instead.
    key = "rft.date" if key == "rft.year"

    prefix, stripped = key.split('.')
          
    # The auinit1 value is COMPLETELY messed up for reasons I do not know.
    # Double encoded in bizarre ways.
    next if key == '@rft.auinit1' || key == '@rft.auinit'



    
    # Darn multi-value SFX hackery, indicated with keys beginning
    # with '@'. Just take the first one,
    # our context object can't store more than one. Then regularize the
    # key name. 
    if (prefix == '@rft')
      array_items = item.search("array/item")
      array_i = array_items[0] unless array_items.blank?
      
      prefix = prefix.slice(1, prefix.length)
      value = array_i ? array_i.inner_text : nil   
    end
    
    # But this still has HTML entities in it sometimes. Now we've
    # got to decode THAT.
    # TODO: Are we sure we need to do this? We need an example
    # from SFX result to test, it's potentially expensive.       
    value = html_ent_coder.decode(value)

    # object_type? Fix that to be the right way.
    if (prefix=='rft') && (key=='object_type')
      co.referent.set_format( value.downcase )
      next
    end
    
    if (prefix == 'rft' && value)
        co.referent.(stripped, value)
    end

    if (prefix=='@rft_id')
        identifiers = item.search('array/item')
        identifiers.each do |id|
          co.referent.add_identifier(id.inner_text)
        end
    end
    if (prefix=='@rfr_id')
        identifiers = item.search('array/item')
        identifiers.each do |id|
          co.referrer.add_identifier(id.inner_text)
        end
    end
  end
  return co
end

.pass_through_url(response) ⇒ Object

Try to provide a weird reverse-engineered url to take the user THROUGH sfx to their destination, so sfx will capture for statistics. This relies on certain information from the orignal sfx response being stored in the Response object at that point. Used by sfx_backchannel_record service.



414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
# File 'lib/service_adaptors/sfx.rb', line 414

def self.pass_through_url(response)
  base_url = response[:sfx_base_url]    
  
  sfx_resolver_cgi_url =  base_url + "/cgi/core/sfxresolver.cgi"      

  
  dataString = "?tmp_ctx_svc_id=#{response[:sfx_target_index]}"
  dataString += "&tmp_ctx_obj_id=#{response[:sfx_obj_index]}"

  # Don't understand what this is, but it sometimes needs to be 1?
  # Hopefully it won't mess anything up when it's not neccesary.
  # Really have no idea when it would need to be something other
  # than 1.
  # Nope, sad to say it does mess up cases where it is not neccesary.
  # Grr. 
  #dataString += "&tmp_parent_ctx_obj_id=1"
  
  dataString += "&service_id=#{response[:sfx_target_service_id]}"
  dataString += "&request_id=#{response[:sfx_request_id]}"
  dataString += "&rft.year="
  dataString += URI.escape(response[:citation_year].to_s) if response[:citation_year]
  dataString += "&rft.volume="
  dataString += URI.escape(response[:citation_volume].to_s) if response[:citation_volume]
  dataString += "&rft.issue="
  dataString += URI.escape(response[:citation_issue].to_s) if response[:citation_issue]
  dataString += "&rft.spage="
  dataString += URI.escape(response[:citation_spage]).to_s if response[:citation_spage]
  
    return sfx_resolver_cgi_url + dataString       
end

Instance Method Details

#base_urlObject



92
93
94
# File 'lib/service_adaptors/sfx.rb', line 92

def base_url
  return @base_url
end

#do_request(client) ⇒ Object



147
148
149
150
# File 'lib/service_adaptors/sfx.rb', line 147

def do_request(client)
  client.transport_inline    
  return client.response
end

#expired_sfx_request(response) ⇒ Object

Using the value of sfx_request_expire_crontab, determine if the umlaut service response is so old that we can’t use it for sfx click passthrough anymore.



389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
# File 'lib/service_adaptors/sfx.rb', line 389

def expired_sfx_request(response)
  require 'CronTab'

  crontab_str = @sfx_requests_expire_crontab

  return false unless crontab_str # no param, no determination possible
  
  crontab = CronTab.new( crontab_str )

  time_of_response = response.created_at

  return false unless time_of_response # no recorded time, not possible either

  expire_time = crontab.nexttime( time_of_response )

  # Give an extra five minutes of time, in case the expire
  # process takes up to five minutes to finish. 
  return( Time.now > (expire_time + 5.minutes) )    
end

#handle(request) ⇒ Object



96
97
98
99
100
101
102
103
104
105
106
# File 'lib/service_adaptors/sfx.rb', line 96

def handle(request)
  client = self.initialize_client(request)
  begin
    response = self.do_request(client)
    self.parse_response(response, request)
    return request.dispatched(self, true)
  rescue Errno::ETIMEDOUT, Timeout::Error => e
    # Request to SFX timed out. Record this as unsuccessful in the dispatch table. Temporary.
    return request.dispatched(self, DispatchedService::FailedTemporary, e)
  end
end

#initialize_client(request) ⇒ Object



108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
# File 'lib/service_adaptors/sfx.rb', line 108

def initialize_client(request)
  transport = OpenURL::Transport.new(@base_url, nil, :open_timeout => @sfx_timeout, :read_timeout => @sfx_timeout)
  
  context_object = request.to_context_object
  
  ## SFX HACK WORKAROUND
  # SFX will parse private_data/pid/rft_dat containing ERIC, when sid/rfr_id
  # is CSA. But it only expects an OpenURL 0.1 to do this. We send it a
  # 1.0. To get it to recognize it anyway, we need to send it a blank
  # url_ver/ctx_ver
  if ( context_object.referrer.identifiers.find {|i| i.start_with? "info:sid/CSA"} &&
       context_object.referent.private_data != nil)
    context_object.openurl_ver = ""
  end
  
  transport.add_context_object(context_object)
  transport.extra_args["sfx.response_type"]="multi_obj_xml"
    
  @get_coverage = false    
  
   = request.referent.    
  if ( ['date'].blank? &&
       ['year'].blank? &&
       (! request.referent.identifiers.find {|i| i =~ /^info\:(doi|pmid)/})
      )
    # No article-level metadata, do some special stuff. 
    transport.extra_args["sfx.ignore_date_threshold"]="1"
    transport.extra_args["sfx.show_availability"]="1"
    @get_coverage = true
  end
  # Workaround to SFX bug, not sure if this is really still neccesary
  # I think it's not, but leave it in anyway just in case. 
  if (context_object.referent.identifiers.find {|i| i =~ /^info:doi\// })
    transport.extra_args['sfx.doi_url']='http://dx.doi.org'
  end
  
  return transport
end

#parse_response(resolver_response, request) ⇒ Object



152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
# File 'lib/service_adaptors/sfx.rb', line 152

def parse_response(resolver_response, request)
  doc = Nokogiri::XML(resolver_response)     

  # Catch an SFX error message (in HTML) that's not an XML
  # document at all.
  unless doc.at('/ctx_obj_set')
    Rails.logger.error("sfx.rb: SFX did not return expected response. SFX response: #{resolver_response}")
    raise "SFX did not return expected response."
  end

  # There can be several context objects in the response.
  # We need to keep track of which data comes from which, for
  # SFX click-through generating et alia
  sfx_objs = doc.search('/ctx_obj_set/ctx_obj')

  # As we go through the possibly multiple SFX context objects,
  # we need to keep track of which one, if any, we want to use
  # to enhance the Umlaut referent metadata.
  #
  # We only enhance for journal type metadata. For book type
  # metadata SFX will return something, but it may not be the manifestation
  # we want. With journal titles, less of an issue. 
  #
  # In case of multiple SFX hits, enhance metadata only from the
  # one that actually had fulltext. If more than one had fulltext, forget it,
  # too error prone. If none had full text, just pick the first. 
  #
  # We'll use these variables to keep track of our 'best fit' as
  # we loop through em.     
  best_fulltext_ctx = nil
  best_nofulltext_ctx = nil

  # We're going to keep our @really_distant_relationship stuff here. 
  related_titles = {}
  
  0.upto(sfx_objs.length - 1 ) do |sfx_obj_index|
  
    sfx_obj = sfx_objs[sfx_obj_index]

    # Get out the "perl_data" section, with our actual OpenURL style
    # context object information. This was XML escaped as a String (actually
    # double-escaped, weirdly), so
    # we need to extract the string, unescape it, and then feed it to Nokogiri
    # again. 
    ctx_obj_atts = sfx_obj.at('./ctx_obj_attributes').inner_text

    perl_data = Nokogiri::XML( ctx_obj_atts )
    # parse it into an OpenURL, we might need it like that. 
    sfx_co = Sfx.parse_perl_data(perl_data)
     = sfx_co.to_hash 

    
    # get SFX objectID
    object_id_node =
      perl_data.at("./perldata/hash/item[@key='rft.object_id']")
    object_id = object_id_node ? object_id_node.inner_text : nil

    # Get SFX requestID
     request_id_node = 
       perl_data.at("./perldata/hash/item[@key='sfx.request_id']")
     request_id = request_id_node ? request_id_node.inner_text : nil
    
    # Get targets service ids
    sfx_target_service_ids =
      sfx_obj.search('ctx_obj_targets/target/target_service_id').collect {|e| e.inner_text}


    
     = request.referent.
    
    # For each target delivered by SFX
    sfx_obj.search("./ctx_obj_targets/target").each_with_index do|target, target_index|  
      response_data = {}

      # First check @extra_targets_of_interest
      sfx_target_name = target.at('./target_name').inner_text
      umlaut_service = @extra_targets_of_interest[sfx_target_name]

      # If not found, look for it in services_of_interest
      unless ( umlaut_service )
        sfx_service_type = target.at("./service_type").inner_text
        umlaut_service = @services_of_interest[sfx_service_type]
      end

      # If we have multiple context objs, skip the ill and ask-a-librarian
      # links for all but the first, to avoid dups. This is a bit messy,
      # but this whole multiple hits thing is messy.
      if ( sfx_obj_index > 0 &&
           ( umlaut_service == 'document_delivery' || 
             umlaut_service == 'export_citation' || 
             umlaut_service == 'help'))
          next
      end
      
      
      # Okay, keep track of best fit ctx for metadata enhancement
      if request.referent.format == "journal"
        if ( umlaut_service == 'fulltext')
          best_fulltext_ctx = perl_data
          best_nofulltext_ctx = nil
        elsif best_nofulltext_ctx == nil
          best_nofulltext_ctx = perl_data
        end
      end
      
      if ( umlaut_service ) # Okay, it's in services or targets of interest
        if (target/"./displayer")
          source = "SFX/"+(target/"./displayer").inner_text
        else
          source = "SFX"+URI.parse(self.url).path
        end

        target_service_id = (target/"./target_service_id").inner_text
        
        coverage = nil
        if ( @get_coverage )
          # Make sure you turn on "Include availability info in text format"
          # in the SFX Admin API configuration.             
          thresholds_str = ""
          target.search('coverage/coverage_text/threshold_text/coverage_statement').each do | threshold |
              thresholds_str += threshold.inner_text.to_s + ".\n";              
          end

          embargoes_str = "";
          target.search('coverage/coverage_text/embargo_text/embargo_statement').each do |embargo |
              embargoes_str += embargo.inner_text.to_s + ".\n";
          end
          
          unless ( thresholds_str.blank? && embargoes_str.blank? )
            coverage = thresholds_str + embargoes_str
          end
        end


        related_note = ""
        # If this is from a related object, add that on as a note too...
        # And maybe skip this entirely! 
        if (related_node = target.at('./related_service_info'))
          relationship = related_node.at('./relation_type').inner_text
          issn = related_node.at('./related_object_issn').inner_text
          sfx_object_id = related_node.at('./related_object_id').inner_text
          title = related_node.at('./related_object_title').inner_text
          
          if @really_distant_relationships.include?(
            related_node.at('./relation_type').inner_text)
            # Show title-level link in see-also instead of full text.
            related_titles[issn] = {
              :sfx_object_id => sfx_object_id,
              :title => title,
              :relationship => relationship,
              :issn => issn
            }
            
            next
          end
          
          related_note = "This version provided from related title:  <i>" + CGI.unescapeHTML( title ) + "</i>.\n"
        end

        if ( sfx_service_type == 'getDocumentDelivery' )
          value_string = request_id
        else
          value_string = (target/"./target_service_id").inner_text          
        end

        response_data[:url] = CGI.unescapeHTML((target/"./target_url").inner_text)
        response_data[:notes] = related_note.to_s + CGI.unescapeHTML((target/"./note").inner_text)
        response_data[:authentication] = CGI.unescapeHTML((target/"./authentication").inner_text)
        response_data[:source] = source
        response_data[:coverage] = coverage if coverage

        # Sfx metadata we want
        response_data[:sfx_base_url] = @base_url
        response_data[:sfx_obj_index] = sfx_obj_index + 1 # sfx is 1 indexed
        response_data[:sfx_target_index] = target_index + 1
        # sometimes the sfx.request_id is missing, go figure. 
        if request_id = (perl_data/"//hash/item[@key='sfx.request_id']").first
          response_data[:sfx_request_id] = request_id.inner_text
        end
        response_data[:sfx_target_service_id] = target_service_id
        response_data[:sfx_target_name] = sfx_target_name
        # At url-generation time, the request isn't available to us anymore,
        # so we better store this citation info here now, since we need it
        # for sfx click passthrough
        
        # Oops, need to take this from SFX delivered metadata.
        
        response_data[:citation_year] = ['rft.date'].to_s[0,4] if ['rft.date'] 
        response_data[:citation_volume] = ['rft.volume'];
        response_data[:citation_issue] = ['rft.issue']
        response_data[:citation_spage] = ['rft.spage']

        # Some debug info
        response_data[:debug_info] =" Target: #{sfx_target_name} ; SFX object ID: #{object_id}"
        
        response_data[:display_text] = (target/"./target_public_name").inner_text
  
        request.add_service_response(
          response_data.merge(
            :service => self,              
            :service_type_value => umlaut_service
         ))
          
            
                            
      end
    end
  end

  # Add in links to our related titles
  related_titles.each_pair do |issn, hash|
    request.add_service_response(        
       :service => self,
       :display_text => "#{sfx_relationship_display(hash[:relationship])}: #{hash[:title]}",
       :notes => "#{ServiceTypeValue['fulltext'].display_name} available",
       :related_object_hash => hash, 
       :service_type_value => "highlighted_link")
  end
  
  # Did we find a ctx best fit for enhancement?
  if best_fulltext_ctx
    enhance_referent(request, best_fulltext_ctx)
  elsif best_nofulltext_ctx
    enhance_referent(request, best_nofulltext_ctx)
  end
  
end

#response_url(service_response, submitted_params) ⇒ Object

Custom url generation for the weird case



524
525
526
527
528
529
530
# File 'lib/service_adaptors/sfx.rb', line 524

def response_url(service_response, )
  if (related_object =  service_response.data_values[:related_object_hash])
    {:controller => 'resolve', "rft.issn" => related_object[:issn], "rft.title" => related_object[:title], "rft.object_id" => related_object[:sfx_object_id] }
  else
    service_response['url']
  end        
end

#service_types_generatedObject

Standard method, used by auto background updater. See Service docs.



83
84
85
86
87
88
89
90
# File 'lib/service_adaptors/sfx.rb', line 83

def service_types_generated
  service_strings = []
  service_strings.concat( @services_of_interest.values() )
  service_strings.concat( @extra_targets_of_interest.values() )
  service_strings.uniq!

  return service_strings.collect { |s| ServiceTypeValue[s] }
end

#sfx_click_passthroughObject



381
382
383
384
# File 'lib/service_adaptors/sfx.rb', line 381

def sfx_click_passthrough
  # From config, or default to false. 
  return @click_passthrough  || false;
end