Class: Blacklight
- Inherits:
-
Service
- Object
- Service
- Blacklight
- Includes:
- MarcHelper, MetadataHelper, UmlautHttp, XmlSchemaHelper
- Defined in:
- lib/service_adaptors/blacklight.rb
Overview
Searches a Blacklight with the cql extension installed.
Params include:
- base_url
-
required. Complete URL to catalog.atom action. Eg "blacklight.mse.jhu.edu/catalog.atom"
- bl_fields
-
required with at least some entries if you want this to do anything. Describe the names of given semantic fields in your BL instance.
-
issn
-
isbn
-
lccn
-
oclcnum
-
id (defaults to 'id')
-
title
-
author
-
serials_limit_clause => not an index name, full URL clause for a limit to apply to known serials searches, for instance "f[]=Serial"
-
- identifier_search
-
Do catalog search on issn/isbn/oclcnum/lccn/bibId. Default true.
- keyword_search
-
Do catalog search on title/author keywords where applicable. Generally only used when identifier_search finds no hits, if identifier_search is on. Default true.
- keyword_per_page
-
How many records to fetch from blacklight when doing keyword searches.
- exclude_holdings
-
Can be used to exclude certain 'dummy' holdings that have certain collection, location, or other values. Eg: exclude_holdings:
collection_str: - World Wide Web - Internet - rft_id_bibnum_prefixes
-
Array of URI prefixes in an rft_id that indicate that the actual solr id comes next. For instance, if your blacklight will send "blacklight.com/catalog/some_id" in an rft_id, then include "blacklight.com/catalog/". Optional.
Constant Summary
Constant Summary
Constants inherited from Service
Service::LinkOutFilterTask, Service::StandardTask
Instance Attribute Summary (collapse)
-
- (Object) base_url
readonly
Returns the value of attribute base_url.
-
- (Object) bl_fields
readonly
Returns the value of attribute bl_fields.
-
- (Object) cql_search_field
readonly
Returns the value of attribute cql_search_field.
-
- (Object) issn
readonly
Returns the value of attribute issn.
Attributes inherited from Service
#name, #priority, #request, #service_id, #session_id, #status, #task, #url
Instance Method Summary (collapse)
-
- (Object) add_holdings(holdings_url, options = {})
Takes a url that will return atom response of dlf_expanded content.
- - (Object) bib_ids_from_atom_entries(entries)
-
- (Object) blacklight_keyword_search_url(request, options = {})
Construct a CQL search against blacklight for author and title, possibly with serial limit.
-
- (Object) blacklight_precise_search_url(request, format = "marc")
Send a CQL request for any identifiers present.
- - (Object) blacklight_url_for_ids(ids, format = "dlf_expanded")
- - (Object) filter_keyword_entries(atom_entries, options = {})
- - (Object) get_solr_id(rft)
- - (Object) handle(request)
-
- (Blacklight) initialize(config)
constructor
A new instance of Blacklight.
-
- (Object) service_types_generated
Standard method, used by background service updater.
Methods included from XmlSchemaHelper
xml_ns, #xml_ns, #xml_to_holdings
Methods included from MarcHelper
#add_856_links, #edition_statement, #get_title, #get_years, #gmd_values, #service_type_for_856, #should_skip_856_link?, #strip_gmd
Methods included from UmlautHttp
#http_fetch, #proxy_like_headers
Methods inherited from Service
#display_name, #handle_wrapper, #link_out_filter, #preempted_by, required_config_params, #response_to_view_data, #response_url, #session, #update_session, #view_data_from_service_type
Constructor Details
- (Blacklight) initialize(config)
A new instance of Blacklight
46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 |
# File 'lib/service_adaptors/blacklight.rb', line 46 def initialize(config) # defaults # If you are sending an OpenURL from a library service, you may # have the HIP bibnum, and include it in the OpenURL as, eg. # rft_id=http://catalog.library.jhu.edu/bib/343434 (except URL-encoded) # Then you'd set rft_id_bibnum_prefix to http://catalog.library.jhu.edu/bib/ @rft_id_bibnum_prefixes = [] @cql_search_field = "cql" @keyword_per_page = 10 @identifier_search = true @keyword_search = true @link_to_search = true super(config) @bl_fields = { "id" => "id "}.merge(@bl_fields) end |
Instance Attribute Details
- (Object) base_url (readonly)
Returns the value of attribute base_url
38 39 40 |
# File 'lib/service_adaptors/blacklight.rb', line 38 def base_url @base_url end |
- (Object) bl_fields (readonly)
Returns the value of attribute bl_fields
39 40 41 |
# File 'lib/service_adaptors/blacklight.rb', line 39 def bl_fields @bl_fields end |
- (Object) cql_search_field (readonly)
Returns the value of attribute cql_search_field
38 39 40 |
# File 'lib/service_adaptors/blacklight.rb', line 38 def cql_search_field @cql_search_field end |
- (Object) issn (readonly)
Returns the value of attribute issn
39 40 41 |
# File 'lib/service_adaptors/blacklight.rb', line 39 def issn @issn end |
Instance Method Details
- (Object) add_holdings(holdings_url, options = {})
Takes a url that will return atom response of dlf_expanded content. Adds Umlaut "holding" ServiceResponses for dlf_expanded, as appropriate. Returns number of holdings added.
225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 |
# File 'lib/service_adaptors/blacklight.rb', line 225 def add_holdings(holdings_url, = {}) [:match_reliability] ||= ServiceResponse::MatchExact [:marc_data] ||= {} atom = Nokogiri::XML( http_fetch(holdings_url).body ) content_entries = atom.search("/atom:feed/atom:entry/atom:content", xml_ns) # For each atom entry, find the dlf_expanded record. For each dlf_expanded # record, take all of it's holdingsrec's if it has them, or all of it's # items if it doesn't, and add them to list. We wind up with a list # of mixed holdingsrec's and items. holdings_xml = content_entries.collect do || copies = .xpath("dlf:record/dlf:holdings/dlf:holdingset/dlf:holdingsrec", xml_ns) copies.length > 0 ? copies : .xpath("dlf:record/dlf:items/dlf:item", xml_ns) end.flatten service_data = holdings_xml.collect do | | atom_entry = .at_xpath("ancestor::atom:entry", xml_ns) atom_id = atom_entry.at_xpath("atom:id/text()", xml_ns).to_s edition_str = edition_statement([:marc_data][atom_id]) url = atom_entry.at_xpath("atom:link[@rel='alternate'][@type='text/html']/attribute::href", xml_ns).to_s xml_to_holdings( ).merge( :service => self, :match_reliability => [:match_reliability], :edition_str => edition_str, :url => url ) end # strip out holdings that aren't really holdings service_data.delete_if do |data| @exclude_holdings.collect do |key, values| values.include?(data[key.to_sym]) end.include?(true) end # Sort by "collection" service_data.sort do |a, b| a[:collection_str] <=> b[:collection_str] end service_data.each do |data| request.add_service_response(data.merge(:service => self), ["holding"]) end return service_data.length end |
- (Object) bib_ids_from_atom_entries(entries)
304 305 306 307 308 309 |
# File 'lib/service_adaptors/blacklight.rb', line 304 def bib_ids_from_atom_entries(entries) entries.xpath("atom:id/text()", xml_ns).to_a.collect do |atom_id| atom_id.to_s =~ /([^\/]+)$/ $1 end.compact end |
- (Object) blacklight_keyword_search_url(request, options = {})
Construct a CQL search against blacklight for author and title, possibly with serial limit. Ask for Atom with embedded MARC back.
192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 |
# File 'lib/service_adaptors/blacklight.rb', line 192 def blacklight_keyword_search_url(request, = {}) [:format] ||= "atom" [:content_format] ||= "marc" clauses = [] # We need both title and author to search keyword style, or # we get too many false positives. Except serials we'll do # title only. sigh, logic tree. title = get_search_title(request.referent) = get_top_level_creator(request.referent) return nil unless title && ( || (@bl_fields["serials_limit_clause"] && title_is_serial?(request.referent))) # phrase search for title, just raw dismax for author # Embed quotes inside the quoted value, need to backslash-quote for CQL, # and backslash the backslashes for ruby literal. clauses.push("#{@bl_fields["title"]} = \"\\\"#{title}\\\"\"") clauses.push("#{@bl_fields["author"]} = \"#{}\"") if url = base_url + "?search_field=#{@cql_search_field}&content_format=#{[:content_format]}&q=#{CGI.escape(clauses.join(" AND "))}" if (@bl_fields["serials_limit_clause"] && title_is_serial?(request.referent)) url += "&" + @bl_fields["serials_limit_clause"] end return url end |
- (Object) blacklight_precise_search_url(request, format = "marc")
Send a CQL request for any identifiers present. Ask for for an atom response with embedded marc21 back.
158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 |
# File 'lib/service_adaptors/blacklight.rb', line 158 def blacklight_precise_search_url(request, format = "marc") # Add search clauses for our identifiers, if we have them and have a configured search field for them. clauses = [] added = [] ["lccn", "isbn", "oclcnum"].each do |key| if bl_fields[key] && request.referent.send(key) clauses.push( "#{bl_fields[key]} = \"#{request.referent.send(key)}\"") added << key end end # Only add ISSN if we don't have an ISBN, reduces false matches if ( !added.include?("isbn") && bl_fields["issn"] && request.referent.issn) clauses.push("#{bl_fields["issn"]} = \"#{request.referent.issn}\"") end # Add Solr document identifier if we can get one from the URL if (id = get_solr_id(request.referent)) clauses.push("#{bl_fields['id']} = \"#{id}\"") end # if we have nothing, we can do no search. return nil if clauses.length == 0 cql = clauses.join(" OR ") return base_url + "?search_field=#{@cql_search_field}&content_format=#{format}&q=#{CGI.escape(cql)}" end |
- (Object) blacklight_url_for_ids(ids, format = "dlf_expanded")
311 312 313 314 315 |
# File 'lib/service_adaptors/blacklight.rb', line 311 def blacklight_url_for_ids(ids, format="dlf_expanded") return nil unless ids.length > 0 return base_url + "?search_field=#{@cql_search_field}&content_format=#{format}&q=" + CGI.escape("#{@bl_fields["id"]} any \"#{ids.join(" ")}\"") end |
- (Object) filter_keyword_entries(atom_entries, options = {})
275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 |
# File 'lib/service_adaptors/blacklight.rb', line 275 def filter_keyword_entries(atom_entries, = {}) [:exclude_ids] ||= [] [:remove_subtitle] ||= true request_title_forms = [ raw_search_title(request.referent).downcase, normalize_title( raw_search_title(request.referent) ) ] request_title_forms << normalize_title( raw_search_title(request.referent), :remove_subtitle => true) if [:remove_subtitle] request_title_forms.compact # Only keep entries with title match, and that aren't in the # exclude_ids list. good_entries = atom_entries.find_all do |atom_entry| title = atom_entry.xpath("atom:title/text()", xml_ns).to_s entry_title_forms = [ title.downcase, normalize_title(title) ] entry_title_forms << normalize_title(title, :remove_subtitle=>true) if [:remove_subtitle] entry_title_forms.compact ((entry_title_forms & request_title_forms).length > 0 && (bib_ids_from_atom_entries(atom_entry) & [:exclude_ids]).length == 0) end return Nokogiri::XML::NodeSet.new( atom_entries.document, good_entries) end |
- (Object) get_solr_id(rft)
318 319 320 321 322 323 324 325 326 327 328 |
# File 'lib/service_adaptors/blacklight.rb', line 318 def get_solr_id(rft) rft.identifiers.each do |id| @rft_id_bibnum_prefixes.each do |prefix| if id[0, prefix.length] == prefix return id[prefix.length, id.length] end end end return nil end |
- (Object) handle(request)
70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 |
# File 'lib/service_adaptors/blacklight.rb', line 70 def handle(request) ids_processed = [] holdings_added = 0 debugger if (@identifier_search && url = blacklight_precise_search_url(request) ) doc = Nokogiri::XML( http_fetch(url).body ) ids_processed.concat( bib_ids_from_atom_entries( doc.xpath("atom:feed/atom:entry", xml_ns) ) ) # namespaces make xpath harder than it should be, but css # selector still easy, thanks nokogiri! Grab the marc from our # results. marc_matches = doc.xpath("atom:feed/atom:entry/atom:content[@type='application/marc']", xml_ns).collect do |encoded_marc21| MARC::Reader.decode( Base64.decode64(encoded_marc21.text) ) end add_856_links(request, marc_matches ) # Got to make a second fetch for dlf_expanded info, cause BL doens't # (yet) let us ask for more than one at once holdings_url = blacklight_precise_search_url( request, "dlf_expanded" ) holdings_added += add_holdings( holdings_url ) if holdings_url end #keyword search. if (@keyword_search && url = blacklight_keyword_search_url(request)) doc = Nokogiri::XML( http_fetch(url).body ) # filter out matches whose titles don't really match at all, or # which have already been seen in identifier search. entries = filter_keyword_entries( doc.xpath("atom:feed/atom:entry", xml_ns) , :exclude_ids => ids_processed, :remove_subtitle => (! title_is_serial?(request.referent)) ) marc_by_atom_id = {} # Grab the marc from our entries. Important not to do a // xpath # search, or we'll wind up matching parent elements not actually # included in our 'entries' list. marc_matches = entries.xpath("atom:content[@type='application/marc']", xml_ns).collect do |encoded_marc21| marc = MARC::Reader.decode( Base64.decode64(encoded_marc21.text) ) marc_by_atom_id[ encoded_marc21.at_xpath("ancestor::atom:entry/atom:id/text()", xml_ns).to_s ] = marc marc end # We've filtered out those we consider just plain bad # matches, everything else we're going to call # an approximate match. Sort so that those with # a date close to our request date are first. if ( year = get_year(request.referent)) marc_matches = marc_matches.partition {|marc| get_years(marc).include?( year )}.flatten end # And add in the 856's add_856_links(request, marc_matches, :match_reliability => ServiceResponse::MatchUnsure) # Fetch and add in the holdings url = blacklight_url_for_ids(bib_ids_from_atom_entries(entries)) holdings_added += add_holdings( url, :match_reliability => ServiceResponse::MatchUnsure, :marc_data => marc_by_atom_id ) if url if (@link_to_search && holdings_added ==0) hit_count = doc.at_xpath("atom:feed/opensearch:totalResults/text()", xml_ns).to_s.to_i html_result_url = doc.at_xpath("atom:feed/atom:link[@rel='alternate'][@type='text/html']/attribute::href", xml_ns).to_s if hit_count > 0 request.add_service_response( { :service => self, :source_name => @display_name, :count => hit_count, :display_text => "#{hit_count} possible #{case; when hit_count > 1 ; 'matches' ; else; 'match' ; end} in #{@display_name}", :url => html_result_url }, [ServiceTypeValue[:holding_search]]) end end end return request.dispatched(self, true) end |
- (Object) service_types_generated
Standard method, used by background service updater. See Service docs.
63 64 65 66 67 |
# File 'lib/service_adaptors/blacklight.rb', line 63 def service_types_generated types = [ ServiceTypeValue[:fulltext], ServiceTypeValue[:holding], ServiceTypeValue[:table_of_contents], ServiceTypeValue[:relevant_link] ] return types end |