Module: BentoSearch::SearchEngine
- Extended by:
- ActiveSupport::Concern
- Includes:
- Capabilities
- Included in:
- DoajArticlesEngine, EbscoHostEngine, EdsEngine, GoogleBooksEngine, GoogleSiteSearchEngine, JournalTocsForJournal, MockEngine, PrimoEngine, ScopusEngine, SummonEngine, WorldcatSruDcEngine, XerxesEngine
- Defined in:
- app/models/bento_search/search_engine.rb
Overview
Module mix-in for bento_search search engines.
Using a SearchEngine
See a whole bunch more examples in the project README.
You can initialize a search engine with configuration (some engines have required configuration):
engine = SomeSearchEngine.new(:config_key => 'foo')
Or, it can be convenient (and is required for some features) to store a search engine with configuration in a global registry:
BentoSearch.register_engine("some_searcher") do |config|
config.engine = "SomeSearchEngine"
config.config_key = "foo"
end
# instantiates a new engine with registered config:
engine = BentoSearch.get_engine("some_searcher")
You can then use the #search method, which returns an instance of
of BentoSearch::Results
results = engine.search("query")
See more docs under #search, as well as project README.
Standard configuration variables.
Some engines require their own engine-specific configuration for api keys and such, and offer their own engine-specific configuration for engine-specific features.
An additional semi-standard configuration variable, some engines take an ‘:auth => true` to tell the engine to assume that all access is by authenticated local users who should be given elevated access to results.
Additional standard configuration keys that are implemented by the bento_search framework:
[for_display.decorator]
String name of decorator class that will be applied by #bento_decorate
helper in standard view. See wiki for more info on decorators. Must be
string name, actual class object not supported (to make it easier
to serialize and transport configuration).
- log_failed_results
-
Default false, if true all failed results are logged to ‘Rails.log.error`. Can set global default with
`BentoSearch.defaults.log_failed_results = true`
Implementing a SearchEngine
Implmeneting a new SearchEngine is relatively straightforward – you are generally only responsible for the parts specific to your search engine: receiving a query, making a call to the external search engine, and translating it’s result to standard a BentoSearch::Results full of BentoSearch::ResultItems.
Start out by simply including the search engine module:
class MyEngine
include BentoSearch::SearchEngine
end
Next, at a minimum, you need to implement a #search_implementation method, which takes a normalized hash of search instructions as input (see documentation at #normalized_search_arguments), and returns BentoSearch::Results item.
The Results object should have #total_items set with total hitcount, and contain BentoSearch::ResultItem objects for each hit in the current page. See individual class documentation for more info.
That’s about the extent of your responsibilities. If the search failed for some reason due to an error, you should return a Results object with it’s #error object set, so it will be ‘failed?`. The framework will take care of this for you for certain uncaught exceptions you allow to rise out of #search_implementation (timeouts, HTTPClient timeouts, nokogiri and MultiJson parse errors).
A SearchEngine object can be re-used for multiple searches, possibly under concurrent multi-threading. Do not store search-specific state in the search object. but you can store configuration-specific state there of course.
Recommend use of HTTPClient, if possible, for http searches. Especially using a class-level HTTPClient instance, to re-use persistent http connections accross searches (can be esp important if you need to contact external search api via https/ssl).
If you have required configuration keys, you can register that with class-level required_configuration_keys method.
You can also advertise max per-page value by overriding max_per_page.
If you support fielded searching, you should over-ride #search_field_definitions; if you support sorting, you should override #sort_definitions. See BentoSearch::SearchEngine::Capabilities module for documentation.
Defined Under Namespace
Modules: Capabilities
Constant Summary collapse
- DefaultPerPage =
10
Instance Method Summary collapse
-
#display_configuration ⇒ Object
Cover method for consistent api with Results.
-
#engine_id ⇒ Object
Cover method for consistent api with Results.
-
#fill_in_search_metadata_for(results, normalized_arguments = {}) ⇒ Object
SOME of the elements of Results to be returned that SearchEngine implementation fills in automatically post-search.
-
#initialize(aConfiguration = Confstruct::Configuration.new) ⇒ Object
If specific SearchEngine calls initialize, you want to call super handles configuration loading, mostly.
-
#normalized_search_arguments(*orig_arguments) ⇒ Object
(also: #parse_search_arguments)
Take the arguments passed into #search, which can be flexibly given in several ways, and normalize to an expected single hash that will be passed to an engine’s #search_implementation.
-
#public_settable_search_args ⇒ Object
Used mainly/only by the AJAX results loading.
-
#search(*arguments) ⇒ Object
Method used to actually get results from a search engine.
Methods included from Capabilities
#max_per_page, #multi_field_search?, #search_field_definitions, #search_keys, #semantic_search_keys, #semantic_search_map, #sort_definitions, #sort_keys
Instance Method Details
#display_configuration ⇒ Object
Cover method for consistent api with Results
444 445 446 |
# File 'app/models/bento_search/search_engine.rb', line 444 def display_configuration configuration.for_display end |
#engine_id ⇒ Object
Cover method for consistent api with Results
449 450 451 |
# File 'app/models/bento_search/search_engine.rb', line 449 def engine_id configuration.id end |
#fill_in_search_metadata_for(results, normalized_arguments = {}) ⇒ Object
SOME of the elements of Results to be returned that SearchEngine implementation fills in automatically post-search. Extracted into a method for DRY in error handling to try to fill these in even in errors. Also can be used as public method for de-serialized or mock results.
300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 |
# File 'app/models/bento_search/search_engine.rb', line 300 def (results, normalized_arguments = {}) results.search_args = normalized_arguments results.start = normalized_arguments[:start] || 0 results.per_page = normalized_arguments[:per_page] results.engine_id = configuration.id results.display_configuration = configuration.for_display # We copy some configuraton info over to each Item, as a convenience # to display logic that may have decide what to do given only an item, # and may want to parameterize based on configuration. results.each do |item| item.engine_id = configuration.id item.decorator = configuration.lookup!("for_display.decorator") item.display_configuration = configuration.for_display end results end |
#initialize(aConfiguration = Confstruct::Configuration.new) ⇒ Object
If specific SearchEngine calls initialize, you want to call super handles configuration loading, mostly. Argument is a Confstruct::Configuration or Hash.
178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 |
# File 'app/models/bento_search/search_engine.rb', line 178 def initialize(aConfiguration = Confstruct::Configuration.new) # To work around weird confstruct bug, we need to change # a hash to a Confstruct ourselves. # https://github.com/mbklein/confstruct/issues/14 unless aConfiguration.kind_of? Confstruct::Configuration aConfiguration = Confstruct::Configuration.new aConfiguration end # init, from copy of default, or new if self.class.default_configuration self.configuration = Confstruct::Configuration.new(self.class.default_configuration) else self.configuration = Confstruct::Configuration.new end # merge in current instance config self.configuration.configure ( aConfiguration ) # global defaults? self.configuration[:for_display] ||= {} unless self.configuration.has_key?(:log_failed_results) self.configuration[:log_failed_results] = BentoSearch.defaults.log_failed_results end # check for required keys -- have to be present, and not nil if self.class.required_configuration self.class.required_configuration.each do |required_key| if ["**NOT_FOUND**", nil].include? self.configuration.lookup!(required_key.to_s, "**NOT_FOUND**") raise ArgumentError.new("#{self.class.name} requires configuration key #{required_key}") end end end end |
#normalized_search_arguments(*orig_arguments) ⇒ Object Also known as: parse_search_arguments
Take the arguments passed into #search, which can be flexibly given in several ways, and normalize to an expected single hash that will be passed to an engine’s #search_implementation. The output of this method is a single hash, and is what a #search_implementation can expect to receive as an argument, with keys:
- :query
-
the query
- :per_page
-
will always be present, using the default per_page if none given by caller
- :start, :page
-
both :start and :page will always be present, regardless of which the caller used. They will both be integers, even if strings passed in.
- :search_field
-
A search field from the engine’s #search_field_definitions, as string. Even if the caller used :semantic_search_field, it’ll be normalized to the actual local search_field key on output.
- :sort
-
Sort key.
337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 |
# File 'app/models/bento_search/search_engine.rb', line 337 def normalized_search_arguments(*orig_arguments) arguments = {} # Two-arg style to one hash, if present if (orig_arguments.length > 1 || (orig_arguments.length == 1 && ! orig_arguments.first.kind_of?(Hash))) arguments[:query] = orig_arguments.delete_at(0) end arguments.merge!(orig_arguments.first) if orig_arguments.length > 0 # allow strings for pagination (like from url query), change to # int please. [:page, :per_page, :start].each do |key| arguments.delete(key) if arguments[key].blank? arguments[key] = arguments[key].to_i if arguments[key] end arguments[:per_page] ||= configuration.default_per_page || DefaultPerPage # illegal arguments if (arguments[:start] && arguments[:page]) raise ArgumentError.new("Can't supply both :page and :start") end if ( arguments[:per_page] && self.max_per_page && arguments[:per_page] > self.max_per_page) raise ArgumentError.new("#{arguments[:per_page]} is more than maximum :per_page of #{self.max_per_page} for #{self.class}") end # Normalize :page to :start, and vice versa if arguments[:page] arguments[:start] = (arguments[:page] - 1) * arguments[:per_page] elsif arguments[:start] arguments[:page] = (arguments[:start] / arguments[:per_page]) + 1 end # normalize :sort from possibly symbol to string # TODO: raise if unrecognized sort key? if arguments[:sort] arguments[:sort] = arguments[:sort].to_s end # Multi-field search if arguments[:query].kind_of? Hash # Only if allowed unless self.multi_field_search? raise ArgumentError.new("You supplied a :query as a hash, but this engine (#{self.class}) does not suport multi-search. #{arguments[:query].inspect}") end # Multi-field search incompatible with :search_field or :semantic_search_field if arguments[:search_field].present? raise ArgumentError.new("You supplied a :query as a Hash, but also a :search_field, you can only use one. #{arguments.inspect}") end if arguments[:semantic_search_field].present? raise ArgumentError.new("You supplied a :query as a Hash, but also a :semantic_search_field, you can only use one. #{arguments.inspect}") end # translate semantic fields, raising for unfound fields if configured arguments[:query].transform_keys! do |key| new_key = self.semantic_search_map[key.to_s] || key if ( config_arg(arguments, :unrecognized_search_field) == "raise" && ! self.search_keys.include?(new_key)) raise ArgumentError.new("#{self.class.name} does not know about search_field #{new_key}, in query Hash #{arguments[:query]}") end new_key end end # translate semantic_search_field to search_field, or raise if # can't. if (semantic = arguments.delete(:semantic_search_field)) && ! semantic.blank? semantic = semantic.to_s # Legacy publication_title is now called source_title semantic = "source_title" if semantic == "publication_title" mapped = self.semantic_search_map[semantic] if config_arg(arguments, :unrecognized_search_field) == "raise" && ! mapped raise ArgumentError.new("#{self.class.name} does not know about :semantic_search_field #{semantic}") end arguments[:search_field] = mapped end if config_arg(arguments, :unrecognized_search_field) == "raise" && ! search_keys.include?(arguments[:search_field]) raise ArgumentError.new("#{self.class.name} does not know about :search_field #{arguments[:search_field]}") end return arguments end |
#public_settable_search_args ⇒ Object
Used mainly/only by the AJAX results loading. an array WHITELIST of attributes that can be sent as non-verified request params and used to execute a search. For instance, ‘auth’ is NOT on there, you can’t trust a web request as to ‘auth’ status. individual engines may over-ride, call super, and add additional engine-specific attributes.
439 440 441 |
# File 'app/models/bento_search/search_engine.rb', line 439 def public_settable_search_args [:query, :search_field, :semantic_search_field, :sort, :page, :start, :per_page] end |
#search(*arguments) ⇒ Object
Method used to actually get results from a search engine.
When implementing a search engine, you do not override this #search method, but instead override #search_implementation. #search will call your specific #search_implementation, first normalizing the query arguments, and then normalizing and adding standard metadata to your return value.
Most engines support pagination, sorting, and searching in a specific
field.
# 1-based page index
engine.search("query", :per_page => 20, :page => 5)
# or use 0-based per-record index, engines that don't
# support this will round to nearest page.
engine.search("query", :start => 20)
You can ask an engine what search fields it supports with engine.search_keys
engine.search("query", :search_field => "engine_search_field_name")
There are also normalized 'semantic' names you can use accross engines
(if they support them): :title, :author, :subject, maybe more.
engine.search("query", :semantic_search_field => :title)
Ask an engine what semantic field names it supports with `engine.semantic_search_keys`
Unrecognized search fields will be ignored, unless you pass in
:unrecognized_search_field => :raise (or do same in config).
Ask an engine what sort fields it supports with `engine.sort_keys`. See
list of standard sort keys in I18n file at ./config/locales/en.yml, in
`en.bento_search.sort_keys`.
engine.search("query", :sort => "some_sort_key")
Some engines support additional arguments to 'search', see individual
engine documentation. For instance, some engines support `:auth => true`
to give the user elevated search privileges when you have an authenticated
local user.
Query as first arg is just a convenience, you can also use a single hash argument.
engine.search(:query => "query", :per_page => 20, :page => 4)
259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 |
# File 'app/models/bento_search/search_engine.rb', line 259 def search(*arguments) start_t = Time.now arguments = normalized_search_arguments(*arguments) results = search_implementation(arguments) (results, arguments) results.timing = (Time.now - start_t) return results rescue *auto_rescue_exceptions => e # Uncaught exception, log and turn into failed Results object. We # only catch certain types of exceptions, or it makes dev really # confusing eating exceptions. This is intentionally a convenience # to allow search engine implementations to just raise the exception # and we'll turn it into a proper error. cleaned_backtrace = Rails.backtrace_cleaner.clean(e.backtrace) log_msg = "BentoSearch::SearchEngine failed results: #{e.inspect}\n #{cleaned_backtrace.join("\n ")}" Rails.logger.error log_msg failed = BentoSearch::Results.new failed.error ||= {} failed.error[:exception] = e failed.timing = (Time.now - start_t) (failed, arguments) return failed ensure if results && configuration.log_failed_results && results.failed? Rails.logger.error("Error fetching results for `#{configuration.id || self}`: #{arguments}: #{results.error}") end end |