Module: FreeScrape
- Defined in:
- lib/free_scrape/item.rb,
lib/free_scrape/version.rb,
lib/free_scrape/category.rb,
lib/free_scrape/item_link.rb,
lib/free_scrape/free_scrape.rb
Defined Under Namespace
Classes: Category, Item, ItemLink
Constant Summary collapse
- VERSION =
'0.1.0'
- COMMON_PROXY_PORT =
Common proxy port
8080
- DEFAULT_LANGUAGE =
Default language
:en
Class Method Summary collapse
-
.item(descriptor) ⇒ Object
Returns the Item with the specified descriptor, which can be either a URI to freebase.com, an Item GUID or an Item name.
-
.language ⇒ Object
Returns the language to access FreeScrape with.
-
.language=(new_language) ⇒ Object
Sets the language to access FreeScrape with to the new_language.
-
.open_page(uri, options = {}) ⇒ Object
Similar to FreeScrape.open_uri but returns an Hpricot document.
-
.open_uri(uri, options = {}) ⇒ Object
Opens the uri with the given options.
-
.proxy ⇒ Object
Returns the
Hash
of proxy information. -
.proxy_uri(proxy_info = FreeScrape.proxy) ⇒ Object
Creates a HTTP URI based from the given proxy_info hash.
-
.user_agent ⇒ Object
Returns the FreeScrape User-Agent.
-
.user_agent=(agent) ⇒ Object
Sets the FreeScrape User-Agent to the specified agent.
-
.user_agent_alias=(name) ⇒ Object
Sets the FreeScrape User-Agent using the specified user-agent alias name.
-
.user_agent_aliases ⇒ Object
Returns the supported FreeScrape User-Agent Aliases.
-
.web_agent(options = {}, &block) ⇒ Object
Creates a new WWW::Mechanize agent with the given options.
Class Method Details
.item(descriptor) ⇒ Object
Returns the Item with the specified descriptor, which can be either a URI to freebase.com, an Item GUID or an Item name.
FreeScrape.item('Aphex Twin')
# => #<FreeScrape::Item:0xb73fdba0 ...>
182 183 184 |
# File 'lib/free_scrape/free_scrape.rb', line 182 def FreeScrape.item(descriptor) Item.from(descriptor) end |
.language ⇒ Object
Returns the language to access FreeScrape with.
164 165 166 |
# File 'lib/free_scrape/free_scrape.rb', line 164 def FreeScrape.language @@free_scrape_language ||= DEFAULT_LANGUAGE end |
.language=(new_language) ⇒ Object
Sets the language to access FreeScrape with to the new_language.
171 172 173 |
# File 'lib/free_scrape/free_scrape.rb', line 171 def FreeScrape.language=(new_language) @@free_scrape_language = new_language.to_sym end |
.open_page(uri, options = {}) ⇒ Object
Similar to FreeScrape.open_uri but returns an Hpricot document.
119 120 121 |
# File 'lib/free_scrape/free_scrape.rb', line 119 def FreeScrape.open_page(uri,={}) Hpricot(FreeScrape.open_uri(uri,)) end |
.open_uri(uri, options = {}) ⇒ Object
Opens the uri with the given options. The contents of the uri will be returned.
options may contain the following keys:
:user_agent_alias
-
The User-Agent Alias to use.
:user_agent
-
The User-Agent String to use.
:proxy
-
A
Hash
of proxy information which may contain the following keys::host
-
The proxy host.
:port
-
The proxy port.
:user
-
The user-name to login as.
:password
-
The password to login with.
FreeScrape.open_uri('http://www.hackety.org/')
FreeScrape.open_uri('http://tenderlovemaking.com/',
:user_agent_alias => 'Linux Mozilla')
FreeScrape.open_uri('http://www.wired.com/',
:user_agent => 'the future')
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 |
# File 'lib/free_scrape/free_scrape.rb', line 97 def FreeScrape.open_uri(uri,={}) headers = {} if [:user_agent_alias] headers['User-Agent'] = WWW::Mechanize::AGENT_ALIASES[[:user_agent_alias]] elsif [:user_agent] headers['User-Agent'] = [:user_agent] elsif FreeScrape.user_agent headers['User-Agent'] = FreeScrape.user_agent end proxy = ([:proxy] || FreeScrape.proxy) if proxy[:host] headers[:proxy] = FreeScrape.proxy_uri(proxy) end return Kernel.open(uri,headers) end |
.proxy ⇒ Object
Returns the Hash
of proxy information.
18 19 20 21 22 23 24 25 |
# File 'lib/free_scrape/free_scrape.rb', line 18 def FreeScrape.proxy @@free_scrape_proxy ||= { :host => nil, :port => COMMON_PROXY_PORT, :user => nil, :password => nil } end |
.proxy_uri(proxy_info = FreeScrape.proxy) ⇒ Object
Creates a HTTP URI based from the given proxy_info hash. The proxy_info hash defaults to Web.proxy, if not given.
proxy_info may contain the following keys:
:host
-
The proxy host.
:port
-
The proxy port. Defaults to COMMON_PROXY_PORT, if not specified.
:user
-
The user-name to login as.
:password
-
The password to login with.
38 39 40 41 42 43 44 45 |
# File 'lib/free_scrape/free_scrape.rb', line 38 def FreeScrape.proxy_uri(proxy_info=FreeScrape.proxy) if FreeScrape.proxy[:host] return URI::HTTP.build(:host => FreeScrape.proxy[:host], :port => FreeScrape.proxy[:port], :userinfo => "#{FreeScrape.proxy[:user]}:#{FreeScrape.proxy[:password]}", :path => '/') end end |
.user_agent ⇒ Object
Returns the FreeScrape User-Agent
57 58 59 |
# File 'lib/free_scrape/free_scrape.rb', line 57 def FreeScrape.user_agent @@free_scrape_user_agent ||= FreeScrape.user_agent_aliases['Windows IE 6'] end |
.user_agent=(agent) ⇒ Object
Sets the FreeScrape User-Agent to the specified agent.
64 65 66 |
# File 'lib/free_scrape/free_scrape.rb', line 64 def FreeScrape.user_agent=(agent) @@free_scrape_user_agent = agent end |
.user_agent_alias=(name) ⇒ Object
Sets the FreeScrape User-Agent using the specified user-agent alias name.
72 73 74 |
# File 'lib/free_scrape/free_scrape.rb', line 72 def FreeScrape.user_agent_alias=(name) @@free_scrape_user_agent = FreeScrape.user_agent_aliases[name.to_s] end |
.user_agent_aliases ⇒ Object
Returns the supported FreeScrape User-Agent Aliases.
50 51 52 |
# File 'lib/free_scrape/free_scrape.rb', line 50 def FreeScrape.user_agent_aliases WWW::Mechanize::AGENT_ALIASES end |
.web_agent(options = {}, &block) ⇒ Object
Creates a new WWW::Mechanize agent with the given options.
options may contain the following keys:
:user_agent_alias
-
The User-Agent Alias to use.
:user_agent
-
The User-Agent string to use.
:proxy
-
A
Hash
of proxy information which may contain the following keys::host
-
The proxy host.
:port
-
The proxy port.
:user
-
The user-name to login as.
:password
-
The password to login with.
FreeScrape.web_agent
FreeScrape.web_agent(:user_agent_alias => 'Linux Mozilla')
FreeScrape.web_agent(:user_agent => 'Google Bot')
141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 |
# File 'lib/free_scrape/free_scrape.rb', line 141 def FreeScrape.web_agent(={},&block) agent = WWW::Mechanize.new if [:user_agent_alias] agent.user_agent_alias = [:user_agent_alias] elsif [:user_agent] agent.user_agent = [:user_agent] elsif FreeScrape.user_agent agent.user_agent = FreeScrape.user_agent end proxy = ([:proxy] || FreeScrape.proxy) if proxy[:host] agent.set_proxy(proxy[:host],proxy[:port],proxy[:user],proxy[:password]) end block.call(agent) if block return agent end |