Module: GoGetter
- Defined in:
- lib/go_getter/utils.rb,
lib/go_getter/go_getter.rb
Constant Summary collapse
- USER_AGENTS =
Some user agents for use with websites that change their behavior according to your browser Set by adding to http_headers: “User-Agent” => USER_AGENTS Use www.useragentstring.com/pages/useragentstring.php to find more user agent strings
{ :chrome10_win => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Chrome/10.0.638.0 Safari/534.16", :chrome10_linux => "Mozilla/5.0 (X11; U; Linux x86_64; en-US) AppleWebKit/534.16 (KHTML, like Gecko) Ubuntu/10.10 Chromium/10.0.648.0 Chrome/10.0.648.0 Safari/534.16", :firefox36_win => "Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2.8) Gecko/20100722 Firefox/3.6.8 ( .NET CLR 3.5.30729; .NET4.0C)", :firefox36_linux => "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.8) Gecko/20100804 Gentoo Firefox/3.6.8", :ie8 => "Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.1; WOW64; Trident/4.0; SLCC2; Media Center PC 6.0; InfoPath.2; MS-RTC LM 8)", :ie7 => "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 6.1; WOW64; SLCC2; .NET CLR 2.0.50727; InfoPath.3; .NET4.0C; .NET4.0E; .NET CLR 3.5.30729; .NET CLR 3.0.30729; MS-RTC LM 8)", :opera11_win => "Opera/9.80 (Windows NT 6.0; U; en) Presto/2.7.39 Version/11.00", :safari5_mac => "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_3; en-us) AppleWebKit/534.1+ (KHTML, like Gecko) Version/5.0 Safari/533.16", }
Class Method Summary collapse
- .get(uri, http_headers = {}, params = {}) ⇒ Object
- .handle_redirection(from_uri, response, http_headers, params) ⇒ Object
-
.parse_url(url) ⇒ Object
Given a URL, which may not be formatted properly, parse a URI.
Class Method Details
.get(uri, http_headers = {}, params = {}) ⇒ Object
11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 |
# File 'lib/go_getter/go_getter.rb', line 11 def GoGetter.get(uri, http_headers = {}, params = {}) uri = parse_url(uri.to_s) unless uri.is_a? URI path = uri.path path << "?#{uri.query}" if uri.query request = Net::HTTP::Get.new(path) http_headers.each {|key, value| request.add_field key, value } # basic authentication request.basic_auth(params[:auth_user], params[:auth_pass]) if params[:auth_user] and params[:auth_pass] # proxy klass = (params[:proxy_host] and params[:proxy_port]) ? Net::HTTP::Proxy(params[:proxy_host], params[:proxy_port], params[:proxy_user], params[:proxy_pass]) : Net::HTTP # SSL opt = (uri.scheme == "https") ? { use_ssl: true, verify_mode: OpenSSL::SSL::VERIFY_NONE } : {} response = klass.start(uri.host, uri.port, opt) do |http| http.read_timeout = params.fetch(:read_timeout, 600) http.request(request) end if response.is_a?(Net::HTTPRedirection) # Redirect # allow for a single redirection by default params[:max_redirects] = 1 unless params.has_key?(:max_redirects) response = handle_redirection(uri, response, http_headers, params) else response.final_uri = uri end return response end |
.handle_redirection(from_uri, response, http_headers, params) ⇒ Object
57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 |
# File 'lib/go_getter/go_getter.rb', line 57 def GoGetter.handle_redirection(from_uri, response, http_headers, params) if params.fetch(:max_redirects, 0) > 0 params[:uris_seen] = Set.new unless params[:uris_seen] if params[:uris_seen].size < params.fetch(:max_redirects, 0) && response['Location'] params[:uris_seen] << from_uri new_uri = URI.parse(response['Location']) # new uri may be just the path, w/o host and port; if so, copy from old unless new_uri.host new_uri.host = from_uri.host new_uri.port = from_uri.port end new_uri.scheme = from_uri.scheme unless new_uri.scheme # avoid infinite redirect loops unless params[:uris_seen].member? new_uri # request the new location just as we did the old one. params[:max_redirects] -= 1 response = GoGetter.get(new_uri, http_headers, params) end end end response end |
.parse_url(url) ⇒ Object
Given a URL, which may not be formatted properly, parse a URI
46 47 48 49 50 51 52 53 54 55 |
# File 'lib/go_getter/go_getter.rb', line 46 def GoGetter.parse_url(url) unless (url =~ %r{^https?://}mi) == 0 url = "http://#{url}" end uri = URI.parse url if uri.path.length == 0 and uri.query.nil? uri.path = "/" end uri end |