Module: GoogleSiteSearch

Defined in:
lib/google-site-search.rb,
lib/google-site-search/result.rb,
lib/google-site-search/search.rb,
lib/google-site-search/version.rb,
lib/google-site-search/url_builder.rb

Overview

A module to help query and parse the google site search api.

Defined Under Namespace

Classes: ParsingError, Result, Search, UrlBuilder

Constant Summary collapse

GOOGLE_SEARCH_URL =
"http://www.google.com"
DEFAULT_PARAMS =
{
  :client => "google-csbe",
  :output => "xml_no_dtd",
}
VERSION =
"0.0.8"

Class Method Summary collapse

Class Method Details

.caching_key(url) ⇒ Object

Takes a url, strips out un-required query params, and compresses a string representation. The intent is to have a small string to use as a caching key.



32
33
34
35
36
37
38
# File 'lib/google-site-search.rb', line 32

def caching_key url
  params = Rack::Utils.parse_query(URI.parse(url).query)
  # ei = "Passes on an alphanumeric parameter that decodes the originating SERP where user clicked on a related search". Don't fully understand what it does but it makes my caching less effective.
  params.delete("ei") 
  key = params.map{|k,v| k.to_s + v.to_s}.sort.join
  key.blank? ? nil : RSmaz.compress(key) 
end

.paginate(url, search_engine_id) ⇒ Object

Expects the URL returned by Search#next_results_url or Search#previous_results_url.

Raises:

  • (StandardError)


41
42
43
44
45
46
47
48
49
50
51
# File 'lib/google-site-search.rb', line 41

def paginate url, search_engine_id
  raise StandardError, "search_engine_id required" if search_engine_id.blank? 
  uri = URI.parse(url.to_s)
  raise StandardError, "url seems to be invalid, parameters expected" if uri.query.blank?
  if uri.relative?
    uri.host = "www.google.com"
    uri.scheme = "http"
  end
  uri.query = uri.query += "&cx=#{search_engine_id}"
  uri.to_s
end

.query(url, result_class = Result) {|search_result| ... } ⇒ Object

See Search - This is a convienence method for creating and querying. This method can except a block which can access the resulting search object.

Yields:

  • (search_result)


55
56
57
58
59
# File 'lib/google-site-search.rb', line 55

def query url, result_class = Result, &block
  search_result = Search.new(url, result_class).query
  yield(search_result) if block_given?
  search_result
end

.query_multiple(times, url, result_class = Result, &block) ⇒ Object

See Search - This allows you to retrieve up to (times) number of searchs if they are available (i.e. Stops if a search has no next_results_url). This method can except a block which can access the resulting search object.



64
65
66
67
68
69
70
71
72
# File 'lib/google-site-search.rb', line 64

def query_multiple times, url, result_class = Result, &block
	searchs = [query(url, result_class, &block).query]
	while (times=times-1) > 0
		next_results_url = searchs.last.try(:next_results_url)
      break if next_results_url.blank?
      searchs << search_result = query(url, result_class, &block).query
	end
	searchs
end

.relative_path(path) ⇒ Object

Google returns a result link as an absolute but you may want a relative version.



82
83
84
85
# File 'lib/google-site-search.rb', line 82

def relative_path path
  uri = URI.parse(path)
  uri.relative? ? path : [uri.path,uri.query].compact.join("?")
end

.request_xml(url) ⇒ Object

Makes a request to the google search api and returns the xml response as a string.



75
76
77
78
# File 'lib/google-site-search.rb', line 75

def request_xml url
  response = Net::HTTP.get_response(URI.parse(url.to_s))
			response.body if response.is_a?(Net::HTTPSuccess)
end

.separate_search_term_from_filters(string) ⇒ Object

Google’s api will give back a full query which has the filter options on it. I like to deal with them separately so this method breaks them up.



88
89
90
91
92
# File 'lib/google-site-search.rb', line 88

def separate_search_term_from_filters(string)
	match = /\smore:p.*/.match(string)
	return [string, nil] if match.nil?
	return [match.pre_match.strip, match[0].strip] 
end