Class: Mechanize
- Inherits:
-
Object
- Object
- Mechanize
- Defined in:
- lib/mechanize.rb
Overview
The Mechanize library is used for automating interactions with a website. It can follow links and submit forms. Form fields can be populated and submitted. A history of URL’s is maintained and can be queried.
Example
require 'mechanize'
require 'logger'
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari'
page = agent.get "http://www.google.com/"
search_form = page.form_with :name => "f"
search_form.field_with(:name => "q").value = "Hello"
search_results = agent.submit search_form
puts search_results.body
Defined Under Namespace
Modules: ElementMatcher, Parser Classes: ContentTypeError, Cookie, CookieJar, Download, Error, File, FileConnection, FileRequest, FileResponse, FileSaver, Form, HTTP, Headers, History, Page, PluggableParser, RedirectLimitReachedError, RedirectNotGetOrHeadError, ResponseCodeError, ResponseReadError, RobotsDisallowedError, TestCase, UnauthorizedError, UnsupportedSchemeError, Util
Constant Summary collapse
- VERSION =
The version of Mechanize you are using.
'2.1'
- AGENT_ALIASES =
Supported User-Agent aliases for use with user_agent_alias=. The description in parenthesis is for informative purposes and is not part of the alias name.
-
Linux Firefox (3.6.1)
-
Linux Konqueror (3)
-
Linux Mozilla
-
Mac Firefox (3.6)
-
Mac Mozilla
-
Mac Safari (5)
-
Mac Safari 4
-
Mechanize (default)
-
Windows IE 6
-
Windows IE 7
-
Windows IE 8
-
Windows IE 9
-
Windows Mozilla
-
iPhone (3.0)
Example:
agent = Mechanize.new agent.user_agent_alias = 'Mac Safari'
-
{ 'Mechanize' => "Mechanize/#{VERSION} Ruby/#{ruby_version} (http://github.com/tenderlove/mechanize/)", 'Linux Firefox' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.1) Gecko/20100122 firefox/3.6.1', 'Linux Konqueror' => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)', 'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624', 'Mac FireFox' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.6; en-US; rv:1.9.2) Gecko/20100115 Firefox/3.6', 'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401', 'Mac Safari 4' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10', 'Mac Safari' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_7_2) AppleWebKit/534.51.22 (KHTML, like Gecko) Version/5.1.1 Safari/534.51.22', 'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)', 'Windows IE 7' => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Windows IE 8' => 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Windows IE 9' => 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)', 'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6', 'iPhone' => 'Mozilla/5.0 (iPhone; U; CPU like Mac OS X; en) AppleWebKit/420+ (KHTML, like Gecko) Version/3.0 Mobile/1C28 Safari/419.3', }
Class Attribute Summary collapse
-
.html_parser ⇒ Object
Default HTML parser for all mechanize instances.
-
.log ⇒ Object
Default logger for all mechanize instances.
Instance Attribute Summary collapse
-
#agent ⇒ Object
readonly
:section: Utilities.
-
#default_encoding ⇒ Object
A default encoding name used when parsing HTML parsing.
-
#force_default_encoding ⇒ Object
Overrides the encodings given by the HTTP server and the HTML page with the default_encoding when set to true.
-
#history_added ⇒ Object
Callback which is invoked with the page that was added to history.
-
#html_parser ⇒ Object
The HTML parser to be used when parsing documents.
-
#keep_alive_time ⇒ Object
HTTP/1.0 keep-alive time.
-
#pluggable_parser ⇒ Object
readonly
:nodoc:.
-
#proxy_addr ⇒ Object
readonly
The HTTP proxy address.
-
#proxy_pass ⇒ Object
readonly
The HTTP proxy password.
-
#proxy_port ⇒ Object
readonly
The HTTP proxy port.
-
#proxy_user ⇒ Object
readonly
The HTTP proxy username.
-
#watch_for_set ⇒ Object
The value of watch_for_set is passed to pluggable parsers for retrieved content.
Class Method Summary collapse
-
.inherited(child) ⇒ Object
:nodoc:.
Instance Method Summary collapse
-
#auth(user, password) ⇒ Object
(also: #basic_auth)
Sets the user and password to be used for HTTP authentication.
-
#back ⇒ Object
Equivalent to the browser back button.
-
#ca_file ⇒ Object
Path to an OpenSSL server certificate file.
-
#ca_file=(ca_file) ⇒ Object
Sets the certificate file used for SSL connections.
-
#cert ⇒ Object
An OpenSSL client certificate or the path to a certificate file.
-
#cert=(cert) ⇒ Object
Sets the OpenSSL client certificate
cert
to the given path or certificate instance. -
#cert_store ⇒ Object
An OpenSSL certificate store for verifying server certificates.
-
#cert_store=(cert_store) ⇒ Object
Sets the OpenSSL certificate store to
store
. -
#certificate ⇒ Object
What is this?.
-
#click(link) ⇒ Object
If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it.
-
#conditional_requests ⇒ Object
Are If-Modified-Since conditional requests enabled?.
-
#conditional_requests=(enabled) ⇒ Object
Disables If-Modified-Since conditional requests (enabled by default).
-
#content_encoding_hooks ⇒ Object
A list of hooks to call before reading response header ‘content-encoding’.
-
#cookie_jar ⇒ Object
A Mechanize::CookieJar which stores cookies.
-
#cookie_jar=(cookie_jar) ⇒ Object
Replaces the cookie jar with
cookie_jar
. -
#cookies ⇒ Object
Returns a list of cookies stored in the cookie jar.
-
#current_page ⇒ Object
(also: #page)
Returns the latest page loaded by Mechanize.
-
#delete(uri, query_params = {}, headers = {}) ⇒ Object
DELETE
uri
withquery_params
, and settingheaders
:. -
#follow_meta_refresh ⇒ Object
Follow HTML meta refresh and HTTP Refresh headers.
-
#follow_meta_refresh=(follow) ⇒ Object
Controls following of HTML meta refresh and HTTP Refresh headers in responses.
-
#follow_meta_refresh_self ⇒ Object
Follow an HTML meta refresh and HTTP Refresh headers that have no “url=” in the content attribute.
-
#follow_meta_refresh_self=(follow) ⇒ Object
Alters the following of HTML meta refresh and HTTP Refresh headers that point to the same page.
-
#get(uri, parameters = [], referer = nil, headers = {}) {|page| ... } ⇒ Object
GET the
uri
with the given requestparameters
,referer
andheaders
. -
#get_file(url) ⇒ Object
GET
url
and return only its contents. -
#gzip_enabled ⇒ Object
Is gzip compression of responses enabled?.
-
#gzip_enabled=(enabled) ⇒ Object
Disables HTTP/1.1 gzip compression (enabled by default).
-
#head(uri, query_params = {}, headers = {}) {|page| ... } ⇒ Object
HEAD
uri
withquery_params
, and settingheaders
:. -
#history ⇒ Object
The history of this mechanize run.
-
#idle_timeout ⇒ Object
Connections that have not been used in this many seconds will be reset.
-
#idle_timeout=(idle_timeout) ⇒ Object
Sets the idle timeout to
idle_timeout
. -
#initialize {|_self| ... } ⇒ Mechanize
constructor
Creates a new mechanize instance.
-
#keep_alive ⇒ Object
Are HTTP/1.1 keep-alive connections enabled?.
-
#keep_alive=(enable) ⇒ Object
Disable HTTP/1.1 keep-alive connections if
enable
is set to false. -
#key ⇒ Object
An OpenSSL private key or the path to a private key.
-
#key=(key) ⇒ Object
Sets the OpenSSL client
key
to the given path or key instance. -
#log ⇒ Object
The current logger.
-
#log=(logger) ⇒ Object
Sets the
logger
used by this instance of mechanize. -
#max_file_buffer ⇒ Object
Responses larger than this will be written to a Tempfile instead of stored in memory.
-
#max_file_buffer=(bytes) ⇒ Object
Sets the maximum size of a response body that will be stored in memory to
bytes
. -
#max_history ⇒ Object
Maximum number of items allowed in the history.
-
#max_history=(length) ⇒ Object
Sets the maximum number of items allowed in the history to
length
. -
#open_timeout ⇒ Object
Length of time to wait until a connection is opened in seconds.
-
#open_timeout=(open_timeout) ⇒ Object
Sets the connection open timeout to
open_timeout
. -
#parse(uri, response, body) ⇒ Object
Parses the
body
of theresponse
fromuri
using the pluggable parser that matches its content type. -
#pass ⇒ Object
OpenSSL client key password.
-
#pass=(pass) ⇒ Object
Sets the client key password to
pass
. -
#post(uri, query = {}, headers = {}) ⇒ Object
POST to the given
uri
with the givenquery
. -
#post_connect_hooks ⇒ Object
A list of hooks to call after retrieving a response.
-
#pre_connect_hooks ⇒ Object
A list of hooks to call before making a request.
-
#pretty_print(q) ⇒ Object
:nodoc:.
-
#put(uri, entity, headers = {}) ⇒ Object
PUT to
uri
withentity
, and settingheaders
:. -
#read_timeout ⇒ Object
Length of time to wait for data from the server.
-
#read_timeout=(read_timeout) ⇒ Object
Sets the timeout for each chunk of data read from the server to
read_timeout
. -
#redirect_ok ⇒ Object
(also: #follow_redirect?)
Controls how mechanize deals with redirects.
-
#redirect_ok=(follow) ⇒ Object
Sets the mechanize redirect handling policy.
-
#redirection_limit ⇒ Object
Maximum number of redirections to follow.
-
#redirection_limit=(limit) ⇒ Object
Sets the maximum number of redirections to follow to
limit
. -
#request_headers ⇒ Object
A hash of custom request headers that will be sent on every request.
-
#request_headers=(request_headers) ⇒ Object
Replaces the custom request headers that will be sent on every request with
request_headers
. -
#request_with_entity(verb, uri, entity, headers = {}) ⇒ Object
Makes an HTTP request to
url
using HTTP methodverb
. -
#retry_change_requests ⇒ Object
Retry POST and other non-idempotent requests.
-
#retry_change_requests=(retry_change_requests) ⇒ Object
When setting
retry_change_requests
to true you are stating that, for all the URLs you access with mechanize, making POST and other non-idempotent requests is safe and will not cause data duplication or other harmful results. -
#robots ⇒ Object
Will
/robots.txt
files be obeyed?. -
#robots=(enabled) ⇒ Object
When
enabled
mechanize will retrieve and obeyrobots.txt
files. -
#scheme_handlers ⇒ Object
The handlers for HTTP and other URI protocols.
-
#scheme_handlers=(scheme_handlers) ⇒ Object
Replaces the URI scheme handler table with
scheme_handlers
. -
#set_proxy(address, port, user = nil, password = nil) ⇒ Object
Sets the proxy
address
atport
with an optionaluser
andpassword
. -
#submit(form, button = nil, headers = {}) ⇒ Object
Submits
form
with an optionalbutton
. -
#transact ⇒ Object
Runs given block, then resets the page history as it was before.
-
#user_agent ⇒ Object
The identification string for the client initiating a web request.
-
#user_agent=(user_agent) ⇒ Object
Sets the User-Agent used by mechanize to
user_agent
. -
#user_agent_alias=(name) ⇒ Object
Set the user agent for the Mechanize object based on the given
name
. -
#verify_callback ⇒ Object
A callback for additional certificate verification.
-
#verify_callback=(verify_callback) ⇒ Object
Sets the OpenSSL certificate verification callback.
-
#verify_mode ⇒ Object
the OpenSSL server certificate verification method.
-
#verify_mode=(verify_mode) ⇒ Object
Sets the OpenSSL server certificate verification method.
-
#visited?(url) ⇒ Boolean
(also: #visited_page)
Returns a visited page for the
url
passed in, otherwise nil.
Constructor Details
#initialize {|_self| ... } ⇒ Mechanize
Creates a new mechanize instance. If a block is given, the created instance is yielded to the block for setting up pre-connection state such as SSL parameters or proxies:
agent = Mechanize.new do |a|
a.proxy_host = 'proxy.example'
a.proxy_port = 8080
end
114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 |
# File 'lib/mechanize.rb', line 114 def initialize @agent = Mechanize::HTTP::Agent.new @agent.context = self @log = nil # attr_accessors @agent.user_agent = AGENT_ALIASES['Mechanize'] @watch_for_set = nil @history_added = nil # attr_readers @pluggable_parser = PluggableParser.new @keep_alive_time = 0 # Proxy @proxy_addr = nil @proxy_port = nil @proxy_user = nil @proxy_pass = nil @html_parser = self.class.html_parser @default_encoding = nil @force_default_encoding = false yield self if block_given? @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass @agent.set_http end |
Class Attribute Details
.html_parser ⇒ Object
Default HTML parser for all mechanize instances
Mechanize.html_parser = Nokogiri::XML
464 465 466 |
# File 'lib/mechanize.rb', line 464 def html_parser @html_parser end |
.log ⇒ Object
Default logger for all mechanize instances
Mechanize.log = Logger.new $stderr
471 472 473 |
# File 'lib/mechanize.rb', line 471 def log @log end |
Instance Attribute Details
#agent ⇒ Object (readonly)
:section: Utilities
959 960 961 |
# File 'lib/mechanize.rb', line 959 def agent @agent end |
#default_encoding ⇒ Object
A default encoding name used when parsing HTML parsing. When set it is used after any other encoding. The default is nil.
479 480 481 |
# File 'lib/mechanize.rb', line 479 def default_encoding @default_encoding end |
#force_default_encoding ⇒ Object
Overrides the encodings given by the HTTP server and the HTML page with the default_encoding when set to true.
485 486 487 |
# File 'lib/mechanize.rb', line 485 def force_default_encoding @force_default_encoding end |
#history_added ⇒ Object
Callback which is invoked with the page that was added to history.
218 219 220 |
# File 'lib/mechanize.rb', line 218 def history_added @history_added end |
#html_parser ⇒ Object
The HTML parser to be used when parsing documents
490 491 492 |
# File 'lib/mechanize.rb', line 490 def html_parser @html_parser end |
#keep_alive_time ⇒ Object
HTTP/1.0 keep-alive time. This is no longer supported by mechanize as it now uses net-http-persistent which only supports HTTP/1.1 persistent connections
497 498 499 |
# File 'lib/mechanize.rb', line 497 def keep_alive_time @keep_alive_time end |
#pluggable_parser ⇒ Object (readonly)
:nodoc:
961 962 963 |
# File 'lib/mechanize.rb', line 961 def pluggable_parser @pluggable_parser end |
#proxy_addr ⇒ Object (readonly)
The HTTP proxy address
502 503 504 |
# File 'lib/mechanize.rb', line 502 def proxy_addr @proxy_addr end |
#proxy_pass ⇒ Object (readonly)
The HTTP proxy password
507 508 509 |
# File 'lib/mechanize.rb', line 507 def proxy_pass @proxy_pass end |
#proxy_port ⇒ Object (readonly)
The HTTP proxy port
512 513 514 |
# File 'lib/mechanize.rb', line 512 def proxy_port @proxy_port end |
#proxy_user ⇒ Object (readonly)
The HTTP proxy username
517 518 519 |
# File 'lib/mechanize.rb', line 517 def proxy_user @proxy_user end |
#watch_for_set ⇒ Object
The value of watch_for_set is passed to pluggable parsers for retrieved content
834 835 836 |
# File 'lib/mechanize.rb', line 834 def watch_for_set @watch_for_set end |
Class Method Details
.inherited(child) ⇒ Object
:nodoc:
98 99 100 101 102 |
# File 'lib/mechanize.rb', line 98 def self.inherited(child) # :nodoc: child.html_parser ||= html_parser child.log ||= log super end |
Instance Method Details
#auth(user, password) ⇒ Object Also known as: basic_auth
Sets the user and password to be used for HTTP authentication.
522 523 524 525 |
# File 'lib/mechanize.rb', line 522 def auth(user, password) @agent.user = user @agent.password = password end |
#back ⇒ Object
Equivalent to the browser back button. Returns the previous page visited.
153 154 155 |
# File 'lib/mechanize.rb', line 153 def back @agent.history.pop end |
#ca_file ⇒ Object
Path to an OpenSSL server certificate file
844 845 846 |
# File 'lib/mechanize.rb', line 844 def ca_file @agent.ca_file end |
#ca_file=(ca_file) ⇒ Object
Sets the certificate file used for SSL connections
851 852 853 |
# File 'lib/mechanize.rb', line 851 def ca_file= ca_file @agent.ca_file = ca_file end |
#cert ⇒ Object
An OpenSSL client certificate or the path to a certificate file.
858 859 860 |
# File 'lib/mechanize.rb', line 858 def cert @agent.cert end |
#cert=(cert) ⇒ Object
Sets the OpenSSL client certificate cert
to the given path or certificate instance
866 867 868 |
# File 'lib/mechanize.rb', line 866 def cert= cert @agent.cert = cert end |
#cert_store ⇒ Object
An OpenSSL certificate store for verifying server certificates. This defaults to the default certificate store.
874 875 876 |
# File 'lib/mechanize.rb', line 874 def cert_store @agent.cert_store end |
#cert_store=(cert_store) ⇒ Object
Sets the OpenSSL certificate store to store
.
881 882 883 |
# File 'lib/mechanize.rb', line 881 def cert_store= cert_store @agent.cert_store = cert_store end |
#certificate ⇒ Object
What is this?
Why is it different from #cert?
890 891 892 |
# File 'lib/mechanize.rb', line 890 def certificate # :nodoc: @agent.certificate end |
#click(link) ⇒ Object
If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.
245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 |
# File 'lib/mechanize.rb', line 245 def click link case link when Page::Link then referer = link.page || current_page() if @agent.robots if (referer.is_a?(Page) and referer.parser.nofollow?) or link.rel?('nofollow') then raise RobotsDisallowedError.new(link.href) end end if link.rel?('noreferrer') href = @agent.resolve(link.href, link.page || current_page) referer = Page.new(nil, {'content-type'=>'text/html'}) else href = link.href end get href, [], referer when String, Regexp then if real_link = page.link_with(:text => link) click real_link else = nil form = page.forms.find do |f| = f.(:value => link) .is_a? Form::Submit end submit form, if form end else referer = current_page() href = link.respond_to?(:href) ? link.href : (link['href'] || link['src']) get href, [], referer end end |
#conditional_requests ⇒ Object
Are If-Modified-Since conditional requests enabled?
532 533 534 |
# File 'lib/mechanize.rb', line 532 def conditional_requests @agent.conditional_requests end |
#conditional_requests=(enabled) ⇒ Object
Disables If-Modified-Since conditional requests (enabled by default)
539 540 541 |
# File 'lib/mechanize.rb', line 539 def conditional_requests= enabled @agent.conditional_requests = enabled end |
#content_encoding_hooks ⇒ Object
A list of hooks to call before reading response header ‘content-encoding’.
The hook is called with the agent making the request, the URI of the request, the response an IO containing the response body.
211 212 213 |
# File 'lib/mechanize.rb', line 211 def content_encoding_hooks @agent.content_encoding_hooks end |
#cookie_jar ⇒ Object
A Mechanize::CookieJar which stores cookies
546 547 548 |
# File 'lib/mechanize.rb', line 546 def @agent. end |
#cookie_jar=(cookie_jar) ⇒ Object
Replaces the cookie jar with cookie_jar
553 554 555 |
# File 'lib/mechanize.rb', line 553 def @agent. = end |
#cookies ⇒ Object
Returns a list of cookies stored in the cookie jar.
560 561 562 |
# File 'lib/mechanize.rb', line 560 def @agent..to_a end |
#current_page ⇒ Object Also known as: page
Returns the latest page loaded by Mechanize
160 161 162 |
# File 'lib/mechanize.rb', line 160 def current_page @agent.current_page end |
#delete(uri, query_params = {}, headers = {}) ⇒ Object
DELETE uri
with query_params
, and setting headers
:
delete('http://example/', {'q' => 'foo'}, {})
286 287 288 289 290 |
# File 'lib/mechanize.rb', line 286 def delete(uri, query_params = {}, headers = {}) page = @agent.fetch(uri, :delete, headers, query_params) add_to_history(page) page end |
#follow_meta_refresh ⇒ Object
Follow HTML meta refresh and HTTP Refresh headers. If set to :anywhere
meta refresh tags outside of the head element will be followed.
568 569 570 |
# File 'lib/mechanize.rb', line 568 def @agent. end |
#follow_meta_refresh=(follow) ⇒ Object
Controls following of HTML meta refresh and HTTP Refresh headers in responses.
576 577 578 |
# File 'lib/mechanize.rb', line 576 def follow @agent. = follow end |
#follow_meta_refresh_self ⇒ Object
Follow an HTML meta refresh and HTTP Refresh headers that have no “url=” in the content attribute.
Defaults to false to prevent infinite refresh loops.
586 587 588 |
# File 'lib/mechanize.rb', line 586 def @agent. end |
#follow_meta_refresh_self=(follow) ⇒ Object
Alters the following of HTML meta refresh and HTTP Refresh headers that point to the same page.
594 595 596 |
# File 'lib/mechanize.rb', line 594 def follow @agent. = follow end |
#get(uri, parameters = [], referer = nil, headers = {}) {|page| ... } ⇒ Object
GET the uri
with the given request parameters
, referer
and headers
.
The referer
may be a URI or a page.
298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 |
# File 'lib/mechanize.rb', line 298 def get(uri, parameters = [], referer = nil, headers = {}) method = :get referer ||= if uri.to_s =~ %r{\Ahttps?://} Page.new(nil, {'content-type'=>'text/html'}) else current_page || Page.new(nil, {'content-type'=>'text/html'}) end # FIXME: Huge hack so that using a URI as a referer works. I need to # refactor everything to pass around URIs but still support # Mechanize::Page#base unless Mechanize::Parser === referer then referer = referer.is_a?(String) ? Page.new(URI.parse(referer), {'content-type' => 'text/html'}) : Page.new(referer, {'content-type' => 'text/html'}) end # fetch the page headers ||= {} page = @agent.fetch uri, method, headers, parameters, referer add_to_history(page) yield page if block_given? page end |
#get_file(url) ⇒ Object
GET url
and return only its contents
328 329 330 |
# File 'lib/mechanize.rb', line 328 def get_file(url) get(url).body end |
#gzip_enabled ⇒ Object
Is gzip compression of responses enabled?
601 602 603 |
# File 'lib/mechanize.rb', line 601 def gzip_enabled @agent.gzip_enabled end |
#gzip_enabled=(enabled) ⇒ Object
Disables HTTP/1.1 gzip compression (enabled by default)
608 609 610 |
# File 'lib/mechanize.rb', line 608 def gzip_enabled=enabled @agent.gzip_enabled = enabled end |
#head(uri, query_params = {}, headers = {}) {|page| ... } ⇒ Object
HEAD uri
with query_params
, and setting headers
:
head('http://example/', {'q' => 'foo'}, {})
337 338 339 340 341 342 |
# File 'lib/mechanize.rb', line 337 def head(uri, query_params = {}, headers = {}) # fetch the page page = @agent.fetch(uri, :head, headers, query_params) yield page if block_given? page end |
#history ⇒ Object
The history of this mechanize run
169 170 171 |
# File 'lib/mechanize.rb', line 169 def history @agent.history end |
#idle_timeout ⇒ Object
Connections that have not been used in this many seconds will be reset.
615 616 617 |
# File 'lib/mechanize.rb', line 615 def idle_timeout @agent.idle_timeout end |
#idle_timeout=(idle_timeout) ⇒ Object
Sets the idle timeout to idle_timeout
. The default timeout is 5 seconds. If you experience “too many connection resets”, reducing this value may help.
623 624 625 |
# File 'lib/mechanize.rb', line 623 def idle_timeout= idle_timeout @agent.idle_timeout = idle_timeout end |
#keep_alive ⇒ Object
Are HTTP/1.1 keep-alive connections enabled?
630 631 632 |
# File 'lib/mechanize.rb', line 630 def keep_alive @agent.keep_alive end |
#keep_alive=(enable) ⇒ Object
Disable HTTP/1.1 keep-alive connections if enable
is set to false. If you are experiencing “too many connection resets” errors setting this to false will eliminate them.
You should first investigate reducing idle_timeout.
641 642 643 |
# File 'lib/mechanize.rb', line 641 def keep_alive= enable @agent.keep_alive = enable end |
#key ⇒ Object
An OpenSSL private key or the path to a private key
897 898 899 |
# File 'lib/mechanize.rb', line 897 def key @agent.key end |
#key=(key) ⇒ Object
Sets the OpenSSL client key
to the given path or key instance
904 905 906 |
# File 'lib/mechanize.rb', line 904 def key= key @agent.key = key end |
#log ⇒ Object
The current logger. If no logger has been set Mechanize.log is used.
648 649 650 |
# File 'lib/mechanize.rb', line 648 def log @log || Mechanize.log end |
#log=(logger) ⇒ Object
Sets the logger
used by this instance of mechanize
655 656 657 |
# File 'lib/mechanize.rb', line 655 def log= logger @log = logger end |
#max_file_buffer ⇒ Object
Responses larger than this will be written to a Tempfile instead of stored in memory. The default is 10240 bytes
663 664 665 |
# File 'lib/mechanize.rb', line 663 def max_file_buffer @agent.max_file_buffer end |
#max_file_buffer=(bytes) ⇒ Object
Sets the maximum size of a response body that will be stored in memory to bytes
671 672 673 |
# File 'lib/mechanize.rb', line 671 def max_file_buffer= bytes @agent.max_file_buffer = bytes end |
#max_history ⇒ Object
Maximum number of items allowed in the history.
176 177 178 |
# File 'lib/mechanize.rb', line 176 def max_history @agent.history.max_size end |
#max_history=(length) ⇒ Object
Sets the maximum number of items allowed in the history to length
.
183 184 185 |
# File 'lib/mechanize.rb', line 183 def max_history= length @agent.history.max_size = length end |
#open_timeout ⇒ Object
Length of time to wait until a connection is opened in seconds
678 679 680 |
# File 'lib/mechanize.rb', line 678 def open_timeout @agent.open_timeout end |
#open_timeout=(open_timeout) ⇒ Object
Sets the connection open timeout to open_timeout
685 686 687 |
# File 'lib/mechanize.rb', line 685 def open_timeout= open_timeout @agent.open_timeout = open_timeout end |
#parse(uri, response, body) ⇒ Object
Parses the body
of the response
from uri
using the pluggable parser that matches its content type
967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 |
# File 'lib/mechanize.rb', line 967 def parse uri, response, body content_type = nil unless response['Content-Type'].nil? data, = response['Content-Type'].split ';', 2 content_type, = data.downcase.split ',', 2 unless data.nil? end # Find our pluggable parser parser_klass = @pluggable_parser.parser content_type unless parser_klass <= Mechanize::Download then body = case body when IO, Tempfile, StringIO then body.read else body end end parser_klass.new uri, response, body, response.code do |parser| parser.mech = self if parser.respond_to? :mech= parser.watch_for_set = @watch_for_set if @watch_for_set and parser.respond_to?(:watch_for_set=) end end |
#pass ⇒ Object
OpenSSL client key password
911 912 913 |
# File 'lib/mechanize.rb', line 911 def pass @agent.pass end |
#pass=(pass) ⇒ Object
Sets the client key password to pass
918 919 920 |
# File 'lib/mechanize.rb', line 918 def pass= pass @agent.pass = pass end |
#post(uri, query = {}, headers = {}) ⇒ Object
POST to the given uri
with the given query
. The query is specified by either a string, or a list of key-value pairs represented by a hash or an array of arrays.
Examples:
agent.post 'http://example.com/', "foo" => "bar"
agent.post 'http://example.com/', [%w[foo bar]]
agent.post('http://example.com/', "<message>hello</message>",
'Content-Type' => 'application/xml')
357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 |
# File 'lib/mechanize.rb', line 357 def post(uri, query={}, headers={}) return request_with_entity(:post, uri, query, headers) if String === query node = {} # Create a fake form class << node def search(*args); []; end end node['method'] = 'POST' node['enctype'] = 'application/x-www-form-urlencoded' form = Form.new(node) query.each { |k, v| if v.is_a?(IO) form.enctype = 'multipart/form-data' ul = Form::FileUpload.new({'name' => k.to_s},::File.basename(v.path)) ul.file_data = v.read form.file_uploads << ul else form.fields << Form::Field.new({'name' => k.to_s},v) end } post_form(uri, form, headers) end |
#post_connect_hooks ⇒ Object
A list of hooks to call after retrieving a response. Hooks are called with the agent and the response returned.
224 225 226 |
# File 'lib/mechanize.rb', line 224 def post_connect_hooks @agent.post_connect_hooks end |
#pre_connect_hooks ⇒ Object
A list of hooks to call before making a request. Hooks are called with the agent and the request to be performed.
232 233 234 |
# File 'lib/mechanize.rb', line 232 def pre_connect_hooks @agent.pre_connect_hooks end |
#pretty_print(q) ⇒ Object
:nodoc:
995 996 997 998 999 1000 1001 1002 |
# File 'lib/mechanize.rb', line 995 def pretty_print(q) # :nodoc: q.object_group(self) { q.breakable q.pp q.breakable q.pp current_page } end |
#put(uri, entity, headers = {}) ⇒ Object
PUT to uri
with entity
, and setting headers
:
put('http://example/', 'new content', {'Content-Type' => 'text/plain'})
388 389 390 |
# File 'lib/mechanize.rb', line 388 def put(uri, entity, headers = {}) request_with_entity(:put, uri, entity, headers) end |
#read_timeout ⇒ Object
Length of time to wait for data from the server
692 693 694 |
# File 'lib/mechanize.rb', line 692 def read_timeout @agent.read_timeout end |
#read_timeout=(read_timeout) ⇒ Object
Sets the timeout for each chunk of data read from the server to read_timeout
. A single request may read many chunks of data.
700 701 702 |
# File 'lib/mechanize.rb', line 700 def read_timeout= read_timeout @agent.read_timeout = read_timeout end |
#redirect_ok ⇒ Object Also known as: follow_redirect?
Controls how mechanize deals with redirects. The following values are allowed:
- :all, true
-
All 3xx redirects are followed (default)
- :permanent
-
Only 301 Moved Permanantly redirects are followed
- false
-
No redirects are followed
712 713 714 |
# File 'lib/mechanize.rb', line 712 def redirect_ok @agent.redirect_ok end |
#redirect_ok=(follow) ⇒ Object
Sets the mechanize redirect handling policy. See redirect_ok for allowed values
722 723 724 |
# File 'lib/mechanize.rb', line 722 def redirect_ok= follow @agent.redirect_ok = follow end |
#redirection_limit ⇒ Object
Maximum number of redirections to follow
729 730 731 |
# File 'lib/mechanize.rb', line 729 def redirection_limit @agent.redirection_limit end |
#redirection_limit=(limit) ⇒ Object
Sets the maximum number of redirections to follow to limit
736 737 738 |
# File 'lib/mechanize.rb', line 736 def redirection_limit= limit @agent.redirection_limit = limit end |
#request_headers ⇒ Object
A hash of custom request headers that will be sent on every request
743 744 745 |
# File 'lib/mechanize.rb', line 743 def request_headers @agent.request_headers end |
#request_headers=(request_headers) ⇒ Object
Replaces the custom request headers that will be sent on every request with request_headers
751 752 753 |
# File 'lib/mechanize.rb', line 751 def request_headers= request_headers @agent.request_headers = request_headers end |
#request_with_entity(verb, uri, entity, headers = {}) ⇒ Object
Makes an HTTP request to url
using HTTP method verb
. entity
is used as the request body, if allowed.
396 397 398 399 400 401 402 403 404 405 406 407 |
# File 'lib/mechanize.rb', line 396 def request_with_entity(verb, uri, entity, headers = {}) cur_page = current_page || Page.new(nil, {'content-type'=>'text/html'}) headers = { 'Content-Type' => 'application/octet-stream', 'Content-Length' => entity.size.to_s, }.update headers page = @agent.fetch uri, verb, headers, [entity], cur_page add_to_history(page) page end |
#retry_change_requests ⇒ Object
Retry POST and other non-idempotent requests. See RFC 2616 9.1.2.
758 759 760 |
# File 'lib/mechanize.rb', line 758 def retry_change_requests @agent.retry_change_requests end |
#retry_change_requests=(retry_change_requests) ⇒ Object
When setting retry_change_requests
to true you are stating that, for all the URLs you access with mechanize, making POST and other non-idempotent requests is safe and will not cause data duplication or other harmful results.
If you are experiencing “too many connection resets” errors you should instead investigate reducing the idle_timeout or disabling keep_alive connections.
772 773 774 |
# File 'lib/mechanize.rb', line 772 def retry_change_requests= retry_change_requests @agent.retry_change_requests = retry_change_requests end |
#robots ⇒ Object
Will /robots.txt
files be obeyed?
779 780 781 |
# File 'lib/mechanize.rb', line 779 def robots @agent.robots end |
#robots=(enabled) ⇒ Object
When enabled
mechanize will retrieve and obey robots.txt
files
787 788 789 |
# File 'lib/mechanize.rb', line 787 def robots= enabled @agent.robots = enabled end |
#scheme_handlers ⇒ Object
The handlers for HTTP and other URI protocols.
794 795 796 |
# File 'lib/mechanize.rb', line 794 def scheme_handlers @agent.scheme_handlers end |
#scheme_handlers=(scheme_handlers) ⇒ Object
Replaces the URI scheme handler table with scheme_handlers
801 802 803 |
# File 'lib/mechanize.rb', line 801 def scheme_handlers= scheme_handlers @agent.scheme_handlers = scheme_handlers end |
#set_proxy(address, port, user = nil, password = nil) ⇒ Object
Sets the proxy address
at port
with an optional user
and password
1007 1008 1009 1010 1011 1012 1013 1014 1015 |
# File 'lib/mechanize.rb', line 1007 def set_proxy address, port, user = nil, password = nil @proxy_addr = address @proxy_port = port @proxy_user = user @proxy_pass = password @agent.set_proxy address, port, user, password @agent.set_http end |
#submit(form, button = nil, headers = {}) ⇒ Object
Submits form
with an optional button
.
Without a button:
page = agent.get('http://example.com')
agent.submit(page.forms.first)
With a button:
agent.submit(page.forms.first, page.forms.first..first)
421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 |
# File 'lib/mechanize.rb', line 421 def submit(form, =nil, headers={}) form.() if case form.method.upcase when 'POST' post_form(form.action, form, headers) when 'GET' get(form.action.gsub(/\?[^\?]*$/, ''), form.build_query, form.page, headers) else raise ArgumentError, "unsupported method: #{form.method.upcase}" end end |
#transact ⇒ Object
Runs given block, then resets the page history as it was before. self is given as a parameter to the block. Returns the value of the block.
441 442 443 444 445 446 447 448 |
# File 'lib/mechanize.rb', line 441 def transact history_backup = @agent.history.dup begin yield self ensure @agent.history = history_backup end end |
#user_agent ⇒ Object
The identification string for the client initiating a web request
808 809 810 |
# File 'lib/mechanize.rb', line 808 def user_agent @agent.user_agent end |
#user_agent=(user_agent) ⇒ Object
Sets the User-Agent used by mechanize to user_agent
. See also user_agent_alias
816 817 818 |
# File 'lib/mechanize.rb', line 816 def user_agent= user_agent @agent.user_agent = user_agent end |
#user_agent_alias=(name) ⇒ Object
Set the user agent for the Mechanize object based on the given name
.
See also AGENT_ALIASES
825 826 827 828 |
# File 'lib/mechanize.rb', line 825 def user_agent_alias= name self.user_agent = AGENT_ALIASES[name] || raise(ArgumentError, "unknown agent alias #{name.inspect}") end |
#verify_callback ⇒ Object
A callback for additional certificate verification. See OpenSSL::SSL::SSLContext#verify_callback
The callback can be used for debugging or to ignore errors by always returning true
. Specifying nil uses the default method that was valid when the SSLContext was created
930 931 932 |
# File 'lib/mechanize.rb', line 930 def verify_callback @agent.verify_callback end |
#verify_callback=(verify_callback) ⇒ Object
Sets the OpenSSL certificate verification callback
937 938 939 |
# File 'lib/mechanize.rb', line 937 def verify_callback= verify_callback @agent.verify_callback = verify_callback end |
#verify_mode ⇒ Object
the OpenSSL server certificate verification method. The default is OpenSSL::SSL::VERIFY_PEER and certificate verification uses the default system certificates. See also cert_store
946 947 948 |
# File 'lib/mechanize.rb', line 946 def verify_mode @agent.verify_mode end |
#verify_mode=(verify_mode) ⇒ Object
Sets the OpenSSL server certificate verification method.
953 954 955 |
# File 'lib/mechanize.rb', line 953 def verify_mode= verify_mode @agent.verify_mode = verify_mode end |
#visited?(url) ⇒ Boolean Also known as: visited_page
Returns a visited page for the url
passed in, otherwise nil
190 191 192 193 194 |
# File 'lib/mechanize.rb', line 190 def visited? url url = url.href if url.respond_to? :href @agent.visited_page url end |