Module: GScraper
- Defined in:
- lib/gscraper/page.rb,
lib/gscraper/hosts.rb,
lib/gscraper/version.rb,
lib/gscraper/gscraper.rb,
lib/gscraper/licenses.rb,
lib/gscraper/has_pages.rb,
lib/gscraper/languages.rb,
lib/gscraper/search/page.rb,
lib/gscraper/search/query.rb,
lib/gscraper/sponsored_ad.rb,
lib/gscraper/search/result.rb,
lib/gscraper/search/search.rb,
lib/gscraper/sponsored_links.rb,
lib/gscraper/search/web_query.rb,
lib/gscraper/search/ajax_query.rb,
lib/gscraper/search/exceptions/blocked.rb
Overview
GScraper - A web-scraping interface to various Google Services.
Copyright (c) 2007-2012 Hal Brodigan (postmodern.mod3 at gmail.com)
This program is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program; if not, write to the Free Software Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
Defined Under Namespace
Modules: HasPages, Hosts, Languages, Licenses, Search Classes: Page, SponsoredAd, SponsoredLinks
Constant Summary collapse
- VERSION =
The version of GScraper
'0.4.0'
- COMMON_PROXY_PORT =
Common proxy port.
8080
Class Method Summary collapse
-
.proxy ⇒ Hash
The proxy information.
-
.proxy_uri(proxy = self.proxy) ⇒ Object
Creates a HTTP URI for the current proxy.
-
.user_agent ⇒ String
The GScraper User-Agent.
-
.user_agent=(agent) ⇒ String
Sets the GScraper User-Agent.
-
.user_agent_alias=(name) ⇒ String
Sets the GScraper User-Agent.
-
.user_agent_aliases ⇒ Array<String>
The supported GScraper User-Agent Aliases.
-
.web_agent(options = {}) {|agent| ... } ⇒ Object
Creates a new Mechanize agent.
Class Method Details
.proxy ⇒ Hash
The proxy information.
34 35 36 37 38 39 40 41 |
# File 'lib/gscraper/gscraper.rb', line 34 def self.proxy @@gscraper_proxy ||= { :host => nil, :port => COMMON_PROXY_PORT, :user => nil, :password => nil } end |
.proxy_uri(proxy = self.proxy) ⇒ Object
Creates a HTTP URI for the current proxy.
61 62 63 64 65 66 67 68 69 70 |
# File 'lib/gscraper/gscraper.rb', line 61 def self.proxy_uri(proxy=self.proxy) if proxy[:host] return URI::HTTP.build( :host => proxy[:host], :port => proxy[:port], :userinfo => "#{proxy[:user]}:#{proxy[:password]}", :path => '/' ) end end |
.user_agent ⇒ String
The GScraper User-Agent.
86 87 88 |
# File 'lib/gscraper/gscraper.rb', line 86 def self.user_agent @@gscraper_user_agent ||= self.user_agent_aliases['Windows IE 6'] end |
.user_agent=(agent) ⇒ String
Sets the GScraper User-Agent.
99 100 101 |
# File 'lib/gscraper/gscraper.rb', line 99 def self.user_agent=(agent) @@gscraper_user_agent = agent end |
.user_agent_alias=(name) ⇒ String
Sets the GScraper User-Agent.
112 113 114 |
# File 'lib/gscraper/gscraper.rb', line 112 def self.user_agent_alias=(name) @@gscraper_user_agent = self.user_agent_aliases[name.to_s] end |
.user_agent_aliases ⇒ Array<String>
The supported GScraper User-Agent Aliases.
77 78 79 |
# File 'lib/gscraper/gscraper.rb', line 77 def self.user_agent_aliases Mechanize::AGENT_ALIASES end |
.web_agent(options = {}) {|agent| ... } ⇒ Object
Creates a new Mechanize agent.
150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 |
# File 'lib/gscraper/gscraper.rb', line 150 def self.web_agent(={}) agent = Mechanize.new if [:user_agent_alias] agent.user_agent_alias = [:user_agent_alias] elsif [:user_agent] agent.user_agent = [:user_agent] elsif user_agent agent.user_agent = self.user_agent end proxy = ([:proxy] || self.proxy) if proxy[:host] agent.set_proxy(proxy[:host],proxy[:port],proxy[:user],proxy[:password]) end yield agent if block_given? return agent end |