Class: Triannon::SolrSearcher

Inherits:
Object
  • Object
show all
Defined in:
app/services/triannon/solr_searcher.rb

Instance Attribute Summary collapse

Class Method Summary collapse

Instance Method Summary collapse

Constructor Details

#initializeSolrSearcher

Returns a new instance of SolrSearcher.



119
120
121
122
123
124
125
# File 'app/services/triannon/solr_searcher.rb', line 119

def initialize
  @rsolr_client = RSolr.connect :url => Triannon.config[:solr_url]
  @logger = Rails.logger
  @max_retries = Triannon.config[:max_solr_retries] || 5
  @base_sleep_seconds = Triannon.config[:base_sleep_seconds] || 1
  @max_sleep_seconds = Triannon.config[:max_sleep_seconds] || 5
end

Instance Attribute Details

#rsolr_clientObject

Returns the value of attribute rsolr_client.



117
118
119
# File 'app/services/triannon/solr_searcher.rb', line 117

def rsolr_client
  @rsolr_client
end

Class Method Details

.anno_graphs_array(rsolr_response) ⇒ Array<Triannon::Graph>

convert RSolr::Response object into an array of Triannon::Graph objects,

where each graph object contains a single annotation returned in the response docs

Parameters:

  • rsolr_response (Hash)

    an RSolr response to a query. It’s actually an RSolr::HashWithResponse but let’s not quibble

Returns:



9
10
11
12
13
14
15
16
# File 'app/services/triannon/solr_searcher.rb', line 9

def self.anno_graphs_array(rsolr_response)
  result = []
  # TODO: deal with Solr pagination
  rsolr_response['response']['docs'].each { |solr_doc_hash|
    result << Triannon::Graph.new(RDF::Graph.new.from_jsonld(solr_doc_hash['anno_jsonld']))
  }
  result
end

.q_terms_for_url(fieldname, url) ⇒ Array<String>

If the url contains a fragment, query terms should only match the exact

url given (with the specific fragment).  (i.e. foo.org#bar does not
match foo.org)

If the url does NOT contain a fragment, query terms should match the

url given (no fragment) AND any urls that are the same with a fragment
added.  (i.e. foo.org  matches  foo.org#bar)

Parameters:

  • fieldname (String)

    the name of the Solr field to be searched with url as a value

  • url (String)

    the url value sought in the Solr field

Returns:

  • (Array<String>)

    an array of query terms to be added to the Solr q argument



106
107
108
109
110
111
112
113
114
# File 'app/services/triannon/solr_searcher.rb', line 106

def self.q_terms_for_url(fieldname, url)
  q_terms = []
  q_terms << "#{fieldname}:#{RSolr.solr_escape(url)}"
  if !url.include? '#'
    # Note: do NOT Solr escape the # (unnec) or the * (want Solr to view it as wildcard)
    q_terms << "#{fieldname}:#{RSolr.solr_escape(url)}#*"
  end
  q_terms
end

.solr_params(controller_params) ⇒ Hash

Note:

hardcoded Solr search service expectation in generated search params

Note:

hardcoded mapping of REST params for /search to Solr params

Convert action request params to appropriate params

to be sent to the search service as part of a search request

request params are given in “Annotation Lists in Triannon” by Robert Sanderson

in Google Docs:
  • targetUri, value is a URI

  • bodyUri, value is a URI

  • bodyExact, value is a string

  • bodyKeyword, value is a string

  • bodyType, value is a URI

  • motivatedBy, value is a URI (or just the fragment portion)

  • annotatedBy, value is a URI

  • annotatedAt, value is a datetime

Parameters:

  • controller_params (Hash<String => String>)

    params from Controller

Returns:

  • (Hash)

    params to send to Solr as a Hash



38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
# File 'app/services/triannon/solr_searcher.rb', line 38

def self.solr_params(controller_params)
  solr_params_hash = {}
  q_terms_array = []
  fq_terms_array = []

  controller_params.each_pair { |k, v|
    case k.downcase
      when 'targeturi'
        q_terms_array << q_terms_for_url("target_url", v)
      when 'bodyuri'
        q_terms_array << q_terms_for_url("body_url", v)
      when 'bodyexact'
        # no need to Solr escape value because it's in quotes
        q_terms_array << "body_chars_exact:\"#{v}\""
      when 'motivatedby'
        case
          when v.include?('#')
            # we want fragment portion of URL value only, as that
            # is what is in Solr
            fq_terms_array << "motivation:#{RSolr.solr_escape(v.sub(/^.*#/, ''))}"
          when v == "http://www.shared-canvas.org/ns/painting", v == "sc:painting"
            fq_terms_array << "motivation:painting"
          else
            fq_terms_array << "motivation:#{RSolr.solr_escape(v)}"
        end
      when 'bodykeyword'
        solr_params_hash[:kqf] = 'body_chars_exact^3 body_chars_unstem^2 body_chars_stem'
        solr_params_hash[:kpf] = 'body_chars_exact^15 body_chars_unstem^10 body_chars_stem^5'
        solr_params_hash[:kpf3] = 'body_chars_exact^9 body_chars_unstem^6 body_chars_stem^3'
        solr_params_hash[:kpf2] = 'body_chars_exact^6 body_chars_unstem^4 body_chars_stem^2'
        q_terms_array << '_query_:"{!dismax qf=$kqf pf=$kpf pf3=$kpf3 pf2=$kpf2}' + RSolr.solr_escape(v) + '"'

      # TODO: add'l params to implement:
      # targetType - fq
      # bodyType - fq
      # annotatedAt - fq (deal with time format and wildcard for specificity)
      # annotatedBy - q (may be incomplete string)
    end
  }

  q_terms_array.flatten
  if q_terms_array.size > 0
    solr_params_hash[:q] = q_terms_array.join(' AND ')
    solr_params_hash[:defType] = "lucene"
  end
  if fq_terms_array.size > 0
    solr_params_hash[:fq] = fq_terms_array
  end

  solr_params_hash

  # TODO:  integration tests for
  #  target_url with and without the scheme prefix
  #  target_url with and without fragment
  #  bodykeyword single terms, multiple terms, quoted strings ...

end

Instance Method Details

#find(controller_params) ⇒ Array<Triannon::Graph>

to be called from controller:

1.  converts controller params to solr params
2.  sends request to Solr
3.  converts Solr response object to array of anno graphs

Parameters:

  • controller_params (Hash<String => String>)

    params from Controller

Returns:

  • (Array<Triannon::Graph>)

    array of Triannon::Graph objects, where each graph object contains a single annotation returned in the response docs



134
135
136
137
138
# File 'app/services/triannon/solr_searcher.rb', line 134

def find(controller_params)
  solr_params = self.class.solr_params(controller_params)
  solr_response = search(solr_params)
  anno_graphs_array = self.class.anno_graphs_array(solr_response)
end