Class: Mechanize
- Inherits:
- Object
- Mechanize
- Defined in:
- lib/mechanize.rb,
The Mechanize library is used for automating interactions with a website. It can follow links and submit forms. Form fields can be populated and submitted. A history of URLs is maintained and can be queried.
require 'mechanize'
require 'logger'
agent =
agent.log = "mech.log"
agent.user_agent_alias = 'Mac Safari'
page = agent.get ""
search_form = page.form_with :name => "f"
search_form.field_with(:name => "q").value = "Hello"
search_results = agent.submit search_form
puts search_results.body
Issues with mechanize
If you think you have a bug with mechanize, but aren’t sure, please file a ticket at
Here are some common problems you may experience with mechanize
Problems connecting to SSL sites
Mechanize defaults to validating SSL certificates using the default CA certificates for your platform. At this time, Windows users do not have integration between the OS default CA certificates and OpenSSL. #cert_store explains how to download and use Mozilla’s CA certificates to allow SSL sites to work.
Problems with content-length
Some sites return an incorrect content-length value. Unlike a browser, mechanize raises an error when the content-length header does not match the response length since it does not know if there was a connection problem or if the mismatch is a server bug.
The error raised, Mechanize::ResponseReadError, can be converted to a parsed Page, File, etc. depending upon the content-type:
agent =
uri = URI 'http://example/invalid_content_length'
page = agent.get uri
rescue Mechanize::ResponseReadError => e
page = e.force_parse
Defined Under Namespace
Modules: CookieCMethods, CookieDeprecated, CookieIMethods, CookieJarIMethods, ElementMatcher, Parser, Prependable Classes: ChunkedTerminationError, ContentTypeError, Cookie, CookieJar, DirectorySaver, Download, ElementNotFoundError, Error, File, FileConnection, FileRequest, FileResponse, FileSaver, Form, HTTP, Headers, History, Image, Page, PluggableParser, RedirectLimitReachedError, RedirectNotGetOrHeadError, ResponseCodeError, ResponseReadError, RobotsDisallowedError, TestCase, UnauthorizedError, UnsupportedSchemeError, Util, XmlFile
Constant Summary collapse
Supported User-Agent aliases for use with user_agent_alias=. The description in parenthesis is for informative purposes and is not part of the alias name.
The default User-Agent alias:
Linux User-Agent aliases:
“Linux Firefox”
“Linux Konqueror”
“Linux Mozilla”
Mac User-Agent aliases:
“Mac Firefox”
“Mac Mozilla”
“Mac Safari 4”
“Mac Safari”
Windows User-Agent aliases:
“Windows Chrome”
“Windows Edge”
“Windows Firefox”
“Windows IE 6”
“Windows IE 7”
“Windows IE 8”
“Windows IE 9”
“Windows IE 10”
“Windows IE 11”
“Windows Mozilla”
Mobile User-Agent aliases:
agent = agent.user_agent_alias = 'Mac Safari'
{ 'Mechanize' => "Mechanize/#{VERSION} Ruby/#{ruby_version} (", 'Linux Firefox' => 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:109.0) Gecko/20100101 Firefox/121.0', 'Linux Konqueror' => 'Mozilla/5.0 (compatible; Konqueror/3; Linux)', 'Linux Mozilla' => 'Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624', 'Mac Firefox' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14.2; rv:109.0) Gecko/20100101 Firefox/121.0', 'Mac Mozilla' => 'Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.4a) Gecko/20030401', 'Mac Safari 4' => 'Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_2; de-at) AppleWebKit/531.21.8 (KHTML, like Gecko) Version/4.0.4 Safari/531.21.10', 'Mac Safari' => 'Mozilla/5.0 (Macintosh; Intel Mac OS X 14_2_1) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Safari/605.1.15', 'Windows Chrome' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36', 'Windows Edge' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/ Safari/537.36 Edg/120.0.2210.133', 'Windows Firefox' => 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/121.0', 'Windows IE 6' => 'Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1)', 'Windows IE 7' => 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Windows IE 8' => 'Mozilla/5.0 (compatible; MSIE 8.0; Windows NT 5.1; Trident/4.0; .NET CLR 1.1.4322; .NET CLR 2.0.50727)', 'Windows IE 9' => 'Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0)', 'Windows IE 10' => 'Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.2; WOW64; Trident/6.0)', 'Windows IE 11' => 'Mozilla/5.0 (Windows NT 6.3; WOW64; Trident/7.0; rv:11.0) like Gecko', 'Windows Mozilla' => 'Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.4b) Gecko/20030516 Mozilla Firebird/0.6', 'Android' => 'Mozilla/5.0 (Linux; Android 10; K) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/120.0.6099.210 Mobile Safari/537.36', 'iPad' => 'Mozilla/5.0 (iPad; CPU OS 17_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1', 'iPhone' => 'Mozilla/5.0 (iPhone; CPU iPhone OS 17_2_1 like Mac OS X) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.2 Mobile/15E148 Safari/604.1', }
Class Attribute Summary collapse
.html_parser ⇒ Object
Default HTML parser for all mechanize instances.
.log ⇒ Object
Default logger for all mechanize instances.
Instance Attribute Summary collapse
#agent ⇒ Object
:section: Utilities.
#default_encoding ⇒ Object
A default encoding name used when parsing HTML parsing.
#force_default_encoding ⇒ Object
Overrides the encodings given by the HTTP server and the HTML page with the default_encoding when set to true.
#history_added ⇒ Object
Callback which is invoked with the page that was added to history.
#html_parser ⇒ Object
The HTML parser to be used when parsing documents.
#keep_alive_time ⇒ Object
HTTP/1.0 keep-alive time.
#pluggable_parser ⇒ Object
The pluggable parser maps a response Content-Type to a parser class.
#proxy_addr ⇒ Object
The HTTP proxy address.
#proxy_pass ⇒ Object
The HTTP proxy password.
#proxy_port ⇒ Object
The HTTP proxy port.
#proxy_user ⇒ Object
The HTTP proxy username.
#watch_for_set ⇒ Object
The value of watch_for_set is passed to pluggable parsers for retrieved content.
Class Method Summary collapse
.inherited(child) ⇒ Object
.start ⇒ Object
Creates a new Mechanize instance and yields it to the given block.
Instance Method Summary collapse
#add_auth(uri, user, password, realm = nil, domain = nil) ⇒ Object
Adds credentials
. -
#auth(user, password, domain = nil) ⇒ Object
(also: #basic_auth)
NOTE: These credentials will be used as a default for any challenge exposing your password to disclosure to malicious servers.
#back ⇒ Object
Equivalent to the browser back button.
#ca_file ⇒ Object
Path to an OpenSSL server certificate file.
#ca_file=(ca_file) ⇒ Object
Sets the certificate file used for SSL connections.
#cert ⇒ Object
An OpenSSL client certificate or the path to a certificate file.
#cert=(cert) ⇒ Object
Sets the OpenSSL client certificate
to the given path or certificate instance. -
#cert_store ⇒ Object
An OpenSSL certificate store for verifying server certificates.
#cert_store=(cert_store) ⇒ Object
Sets the OpenSSL certificate store to
. -
#certificate ⇒ Object
What is this?.
#click(link) ⇒ Object
If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it.
#conditional_requests ⇒ Object
Are If-Modified-Since conditional requests enabled?.
#conditional_requests=(enabled) ⇒ Object
Disables If-Modified-Since conditional requests (enabled by default).
#content_encoding_hooks ⇒ Object
A list of hooks to call before reading response header ‘content-encoding’.
#cookie_jar ⇒ Object
A Mechanize::CookieJar which stores cookies.
#cookie_jar=(cookie_jar) ⇒ Object
Replaces the cookie jar with
. -
#cookies ⇒ Object
Returns a list of cookies stored in the cookie jar.
#current_page ⇒ Object
(also: #page)
Returns the latest page loaded by Mechanize.
#delete(uri, query_params = {}, headers = {}) ⇒ Object
, and settingheaders
:. -
#download(uri, io_or_filename, parameters = [], referer = nil, headers = {}) ⇒ Object
and writes it toio_or_filename
without recording the request in the history. -
#follow_meta_refresh ⇒ Object
Follow HTML meta refresh and HTTP Refresh headers.
#follow_meta_refresh=(follow) ⇒ Object
Controls following of HTML meta refresh and HTTP Refresh headers in responses.
#follow_meta_refresh_self ⇒ Object
Follow an HTML meta refresh and HTTP Refresh headers that have no “url=” in the content attribute.
#follow_meta_refresh_self=(follow) ⇒ Object
Alters the following of HTML meta refresh and HTTP Refresh headers that point to the same page.
#get(uri, parameters = [], referer = nil, headers = {}) {|page| ... } ⇒ Object
GET the
with the given requestparameters
. -
#get_file(url) ⇒ Object
and return only its contents. -
#gzip_enabled ⇒ Object
Is gzip compression of responses enabled?.
#gzip_enabled=(enabled) ⇒ Object
Disables HTTP/1.1 gzip compression (enabled by default).
#head(uri, query_params = {}, headers = {}) {|page| ... } ⇒ Object
:. -
#history ⇒ Object
The history of this mechanize run.
#idle_timeout ⇒ Object
Connections that have not been used in this many seconds will be reset.
#idle_timeout=(idle_timeout) ⇒ Object
Sets the idle timeout to
. -
#ignore_bad_chunking ⇒ Object
When set to true mechanize will ignore an EOF during chunked transfer encoding so long as at least one byte was received.
#ignore_bad_chunking=(ignore_bad_chunking) ⇒ Object
When set to true mechanize will ignore an EOF during chunked transfer encoding.
#initialize(connection_name = 'mechanize') {|_self| ... } ⇒ Mechanize
Creates a new mechanize instance.
#keep_alive ⇒ Object
Are HTTP/1.1 keep-alive connections enabled?.
#keep_alive=(enable) ⇒ Object
Disable HTTP/1.1 keep-alive connections if
is set to false. -
#key ⇒ Object
An OpenSSL private key or the path to a private key.
#key=(key) ⇒ Object
Sets the OpenSSL client
to the given path or key instance. -
#log ⇒ Object
The current logger.
#log=(logger) ⇒ Object
Sets the
used by this instance of mechanize. -
#max_file_buffer ⇒ Object
Responses larger than this will be written to a Tempfile instead of stored in memory.
#max_file_buffer=(bytes) ⇒ Object
Sets the maximum size of a response body that will be stored in memory to
. -
#max_history ⇒ Object
Maximum number of items allowed in the history.
#max_history=(length) ⇒ Object
Sets the maximum number of items allowed in the history to
. -
#open_timeout ⇒ Object
Length of time to wait until a connection is opened in seconds.
#open_timeout=(open_timeout) ⇒ Object
Sets the connection open timeout to
. -
#parse(uri, response, body) ⇒ Object
Parses the
of theresponse
using the pluggable parser that matches its content type. -
#pass ⇒ Object
OpenSSL client key password.
#pass=(pass) ⇒ Object
Sets the client key password to
. -
#post(uri, query = {}, headers = {}) ⇒ Object
POST to the given
with the givenquery
. -
#post_connect_hooks ⇒ Object
A list of hooks to call after retrieving a response.
#pre_connect_hooks ⇒ Object
A list of hooks to call before retrieving a response.
#pretty_print(q) ⇒ Object
#put(uri, entity, headers = {}) ⇒ Object
PUT to
, and settingheaders
:. -
#read_timeout ⇒ Object
Length of time to wait for data from the server.
#read_timeout=(read_timeout) ⇒ Object
Sets the timeout for each chunk of data read from the server to
. -
#redirect_ok ⇒ Object
(also: #follow_redirect?)
Controls how mechanize deals with redirects.
#redirect_ok=(follow) ⇒ Object
(also: #follow_redirect=)
Sets the mechanize redirect handling policy.
#redirection_limit ⇒ Object
Maximum number of redirections to follow.
#redirection_limit=(limit) ⇒ Object
Sets the maximum number of redirections to follow to
. -
#request_headers ⇒ Object
A hash of custom request headers that will be sent on every request.
#request_headers=(request_headers) ⇒ Object
Replaces the custom request headers that will be sent on every request with
. -
#request_with_entity(verb, uri, entity, headers = {}) ⇒ Object
Makes an HTTP request to
using HTTP methodverb
. -
#reset ⇒ Object
Clears history and cookies.
#resolve(link) ⇒ Object
Resolve the full path of a link / uri.
#retry_change_requests ⇒ Object
Retry POST and other non-idempotent requests.
#retry_change_requests=(retry_change_requests) ⇒ Object
When setting
to true you are stating that, for all the URLs you access with mechanize, making POST and other non-idempotent requests is safe and will not cause data duplication or other harmful results. -
#robots ⇒ Object
files be obeyed?. -
#robots=(enabled) ⇒ Object
mechanize will retrieve and obeyrobots.txt
files. -
#scheme_handlers ⇒ Object
The handlers for HTTP and other URI protocols.
#scheme_handlers=(scheme_handlers) ⇒ Object
Replaces the URI scheme handler table with
. -
#set_proxy(address, port, user = nil, password = nil) ⇒ Object
Sets the proxy
with an optionaluser
. -
#shutdown ⇒ Object
Shuts down this session by clearing browsing state and closing all persistent connections.
#ssl_version ⇒ Object
SSL version to use.
#ssl_version=(ssl_version) ⇒ Object
Sets the SSL version to use to
without client/server negotiation. -
#submit(form, button = nil, headers = {}) ⇒ Object
with an optionalbutton
. -
#transact ⇒ Object
Runs given block, then resets the page history as it was before.
#user_agent ⇒ Object
The identification string for the client initiating a web request.
#user_agent=(user_agent) ⇒ Object
Sets the User-Agent used by mechanize to
. -
#user_agent_alias=(name) ⇒ Object
Set the user agent for the Mechanize object based on the given
. -
#verify_callback ⇒ Object
A callback for additional certificate verification.
#verify_callback=(verify_callback) ⇒ Object
Sets the OpenSSL certificate verification callback.
#verify_mode ⇒ Object
the OpenSSL server certificate verification method.
#verify_mode=(verify_mode) ⇒ Object
Sets the OpenSSL server certificate verification method.
#visited?(url) ⇒ Boolean
(also: #visited_page)
Returns a visited page for the
passed in, otherwise nil. -
#write_timeout ⇒ Object
Length of time to wait for data to be sent to the server.
#write_timeout=(write_timeout) ⇒ Object
Sets the timeout for each chunk of data to be sent to the server to
Constructor Details
#initialize(connection_name = 'mechanize') {|_self| ... } ⇒ Mechanize
Creates a new mechanize instance. If a block is given, the created instance is yielded to the block for setting up pre-connection state such as SSL parameters or proxies:
agent = do |a|
a.proxy_addr = 'proxy.example'
a.proxy_port = 8080
If you need segregated SSL connections give each agent a unique name. Otherwise the connections will be shared. This is particularly important if you are using certificates.
agent_1 = 'conn1'
agent_2 = 'conn2'
209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 |
# File 'lib/mechanize.rb', line 209 def initialize(connection_name = 'mechanize') @agent = @agent.context = self @log = nil # attr_accessors @agent.user_agent = AGENT_ALIASES['Mechanize'] @watch_for_set = nil @history_added = nil # attr_readers @pluggable_parser = @keep_alive_time = 0 # Proxy @proxy_addr = nil @proxy_port = nil @proxy_user = nil @proxy_pass = nil @html_parser = self.class.html_parser @default_encoding = nil @force_default_encoding = false # defaults @agent.max_history = 50 yield self if block_given? @agent.set_proxy @proxy_addr, @proxy_port, @proxy_user, @proxy_pass end |
Class Attribute Details
.html_parser ⇒ Object
Default HTML parser for all mechanize instances
Mechanize.html_parser = Nokogiri::XML
642 643 644 |
# File 'lib/mechanize.rb', line 642 def html_parser @html_parser end |
.log ⇒ Object
Default logger for all mechanize instances
Mechanize.log = $stderr
649 650 651 |
# File 'lib/mechanize.rb', line 649 def log @log end |
Instance Attribute Details
#agent ⇒ Object (readonly)
:section: Utilities
1268 1269 1270 |
# File 'lib/mechanize.rb', line 1268 def agent @agent end |
#default_encoding ⇒ Object
A default encoding name used when parsing HTML parsing. When set it is used after any other encoding. The default is nil.
657 658 659 |
# File 'lib/mechanize.rb', line 657 def default_encoding @default_encoding end |
#force_default_encoding ⇒ Object
Overrides the encodings given by the HTTP server and the HTML page with the default_encoding when set to true.
663 664 665 |
# File 'lib/mechanize.rb', line 663 def force_default_encoding @force_default_encoding end |
#history_added ⇒ Object
Callback which is invoked with the page that was added to history.
324 325 326 |
# File 'lib/mechanize.rb', line 324 def history_added @history_added end |
#html_parser ⇒ Object
The HTML parser to be used when parsing documents
668 669 670 |
# File 'lib/mechanize.rb', line 668 def html_parser @html_parser end |
#keep_alive_time ⇒ Object
HTTP/1.0 keep-alive time. This is no longer supported by mechanize as it now uses net-http-persistent which only supports HTTP/1.1 persistent connections
675 676 677 |
# File 'lib/mechanize.rb', line 675 def keep_alive_time @keep_alive_time end |
#pluggable_parser ⇒ Object (readonly)
The pluggable parser maps a response Content-Type to a parser class. The registered Content-Type may be either a full content type like ‘image/png’ or a media type ‘text’. See Mechanize::PluggableParser for further details.
agent.pluggable_parser['application/octet-stream'] = Mechanize::Download
687 688 689 |
# File 'lib/mechanize.rb', line 687 def pluggable_parser @pluggable_parser end |
#proxy_addr ⇒ Object (readonly)
The HTTP proxy address
692 693 694 |
# File 'lib/mechanize.rb', line 692 def proxy_addr @proxy_addr end |
#proxy_pass ⇒ Object (readonly)
The HTTP proxy password
697 698 699 |
# File 'lib/mechanize.rb', line 697 def proxy_pass @proxy_pass end |
#proxy_port ⇒ Object (readonly)
The HTTP proxy port
702 703 704 |
# File 'lib/mechanize.rb', line 702 def proxy_port @proxy_port end |
#proxy_user ⇒ Object (readonly)
The HTTP proxy username
707 708 709 |
# File 'lib/mechanize.rb', line 707 def proxy_user @proxy_user end |
#watch_for_set ⇒ Object
The value of watch_for_set is passed to pluggable parsers for retrieved content
1107 1108 1109 |
# File 'lib/mechanize.rb', line 1107 def watch_for_set @watch_for_set end |
Class Method Details
.inherited(child) ⇒ Object
168 169 170 171 172 |
# File 'lib/mechanize.rb', line 168 def self.inherited(child) # :nodoc: child.html_parser = html_parser child.log = log super end |
.start ⇒ Object
Creates a new Mechanize instance and yields it to the given block.
After the block executes, the instance is cleaned up. This includes closing all open connections.
Mechanize.start do |m|
184 185 186 187 188 189 |
# File 'lib/mechanize.rb', line 184 def self.start instance = new yield(instance) ensure instance.shutdown end |
Instance Method Details
#add_auth(uri, user, password, realm = nil, domain = nil) ⇒ Object
Adds credentials user
, pass
for uri
. If realm
is set the credentials are used only for that realm. If realm
is not set the credentials become the default for any realm on that URI.
and realm
are exclusive as NTLM does not follow RFC 2617. If domain
is given it is only used for NTLM authentication.
742 743 744 |
# File 'lib/mechanize.rb', line 742 def add_auth uri, user, password, realm = nil, domain = nil @agent.add_auth uri, user, password, realm, domain end |
#auth(user, password, domain = nil) ⇒ Object Also known as: basic_auth
NOTE: These credentials will be used as a default for any challenge exposing your password to disclosure to malicious servers. Use of this method will warn. This method is deprecated and will be removed in mechanize 3.
Sets the user
and password
as the default credentials to be used for HTTP authentication for any server. The domain
is used for NTLM authentication.
719 720 721 722 723 724 725 726 727 728 729 730 |
# File 'lib/mechanize.rb', line 719 def auth user, password, domain = nil c = caller_locations(1,1).first warn <<-WARNING At #{c.absolute_path} line #{c.lineno} Use of #auth and #basic_auth are deprecated due to a security vulnerability. WARNING @agent.add_default_auth user, password, domain end |
#back ⇒ Object
Equivalent to the browser back button. Returns the previous page visited.
250 251 252 |
# File 'lib/mechanize.rb', line 250 def back @agent.history.pop end |
#ca_file ⇒ Object
Path to an OpenSSL server certificate file
1117 1118 1119 |
# File 'lib/mechanize.rb', line 1117 def ca_file @agent.ca_file end |
#ca_file=(ca_file) ⇒ Object
Sets the certificate file used for SSL connections
1124 1125 1126 |
# File 'lib/mechanize.rb', line 1124 def ca_file= ca_file @agent.ca_file = ca_file end |
#cert ⇒ Object
An OpenSSL client certificate or the path to a certificate file.
1131 1132 1133 |
# File 'lib/mechanize.rb', line 1131 def cert @agent.certificate end |
#cert=(cert) ⇒ Object
Sets the OpenSSL client certificate cert
to the given path or certificate instance
1139 1140 1141 |
# File 'lib/mechanize.rb', line 1139 def cert= cert @agent.certificate = cert end |
#cert_store ⇒ Object
An OpenSSL certificate store for verifying server certificates. This defaults to the default certificate store for your system.
If your system does not ship with a default set of certificates you can retrieve a copy of the set from Mozilla here:
(Note that this set does not have an HTTPS download option so you may wish to use the script to extract the certificates from a local install to avoid man-in-the-middle attacks.)
After downloading or generating a cacert.pem from the above link you can create a certificate store from the pem file like this:
cert_store =
cert_store.add_file 'cacert.pem'
And have mechanize use it with:
agent.cert_store = cert_store
1165 1166 1167 |
# File 'lib/mechanize.rb', line 1165 def cert_store @agent.cert_store end |
#cert_store=(cert_store) ⇒ Object
Sets the OpenSSL certificate store to store
See also #cert_store
1174 1175 1176 |
# File 'lib/mechanize.rb', line 1174 def cert_store= cert_store @agent.cert_store = cert_store end |
#certificate ⇒ Object
What is this?
Why is it different from #cert?
1183 1184 1185 |
# File 'lib/mechanize.rb', line 1183 def certificate # :nodoc: @agent.certificate end |
#click(link) ⇒ Object
If the parameter is a string, finds the button or link with the value of the string on the current page and clicks it. Otherwise, clicks the Mechanize::Page::Link object passed in. Returns the page fetched.
351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 |
# File 'lib/mechanize.rb', line 351 def click link case link when Page::Link then referer = || current_page() if @agent.robots if (referer.is_a?(Page) and referer.parser.nofollow?) or link.rel?('nofollow') then raise end end if link.noreferrer? href = @agent.resolve(link.href, || current_page) referer = else href = link.href end get href, [], referer when String, Regexp then if real_link = page.link_with(:text => link) click real_link else = nil # Note that this will not work if we have since navigated to a different page. # Should rather make each button aware of its parent form. form = page.forms.find do |f| = f.(:value => link) .is_a? Form::Submit end submit form, if form end when Form::Submit, Form::ImageButton then # Note that this will not work if we have since navigated to a different page. # Should rather make each button aware of its parent form. form = page.forms.find do |f| f..include?(link) end submit form, link if form else referer = current_page() href = link.respond_to?(:href) ? link.href : (link['href'] || link['src']) get href, [], referer end end |
#conditional_requests ⇒ Object
Are If-Modified-Since conditional requests enabled?
749 750 751 |
# File 'lib/mechanize.rb', line 749 def conditional_requests @agent.conditional_requests end |
#conditional_requests=(enabled) ⇒ Object
Disables If-Modified-Since conditional requests (enabled by default)
756 757 758 |
# File 'lib/mechanize.rb', line 756 def conditional_requests= enabled @agent.conditional_requests = enabled end |
#content_encoding_hooks ⇒ Object
A list of hooks to call before reading response header ‘content-encoding’.
The hook is called with the agent making the request, the URI of the request, the response an IO containing the response body.
317 318 319 |
# File 'lib/mechanize.rb', line 317 def content_encoding_hooks @agent.content_encoding_hooks end |
#cookie_jar ⇒ Object
A Mechanize::CookieJar which stores cookies
763 764 765 |
# File 'lib/mechanize.rb', line 763 def @agent. end |
#cookie_jar=(cookie_jar) ⇒ Object
Replaces the cookie jar with cookie_jar
770 771 772 |
# File 'lib/mechanize.rb', line 770 def @agent. = end |
#cookies ⇒ Object
Returns a list of cookies stored in the cookie jar.
777 778 779 |
# File 'lib/mechanize.rb', line 777 def @agent..to_a end |
#current_page ⇒ Object Also known as: page
Returns the latest page loaded by Mechanize
257 258 259 |
# File 'lib/mechanize.rb', line 257 def current_page @agent.current_page end |
#delete(uri, query_params = {}, headers = {}) ⇒ Object
with query_params
, and setting headers
is formatted into a query string using Mechanize::Util.build_query_string, which see.
delete('http://example/', {'q' => 'foo'}, {})
445 446 447 448 449 |
# File 'lib/mechanize.rb', line 445 def delete(uri, query_params = {}, headers = {}) page = @agent.fetch(uri, :delete, headers, query_params) add_to_history(page) page end |
#download(uri, io_or_filename, parameters = [], referer = nil, headers = {}) ⇒ Object
GETs uri
and writes it to io_or_filename
without recording the request in the history. If io_or_filename
does not respond to #write it will be used as a file name. parameters
, referer
and headers
are used as in #get.
By default, if the Content-type of the response matches a Mechanize::File or Mechanize::Page parser, the response body will be loaded into memory before being saved. See #pluggable_parser for details on changing this default.
For alternate ways of downloading files see Mechanize::FileSaver and Mechanize::DirectorySaver.
410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 433 434 435 |
# File 'lib/mechanize.rb', line 410 def download uri, io_or_filename, parameters = [], referer = nil, headers = {} page = transact do get uri, parameters, referer, headers end io = if io_or_filename.respond_to? :write then io_or_filename else, 'wb') end case page when Mechanize::File then io.write page.body else body_io = page.body_io until body_io.eof? do io.write 16384 end end page ensure io.close if io and not io_or_filename.respond_to? :write end |
#follow_meta_refresh ⇒ Object
Follow HTML meta refresh and HTTP Refresh headers. If set to :anywhere
meta refresh tags outside of the head element will be followed.
785 786 787 |
# File 'lib/mechanize.rb', line 785 def @agent. end |
#follow_meta_refresh=(follow) ⇒ Object
Controls following of HTML meta refresh and HTTP Refresh headers in responses.
793 794 795 |
# File 'lib/mechanize.rb', line 793 def follow @agent. = follow end |
#follow_meta_refresh_self ⇒ Object
Follow an HTML meta refresh and HTTP Refresh headers that have no “url=” in the content attribute.
Defaults to false to prevent infinite refresh loops.
803 804 805 |
# File 'lib/mechanize.rb', line 803 def @agent. end |
#follow_meta_refresh_self=(follow) ⇒ Object
Alters the following of HTML meta refresh and HTTP Refresh headers that point to the same page.
811 812 813 |
# File 'lib/mechanize.rb', line 811 def follow @agent. = follow end |
#get(uri, parameters = [], referer = nil, headers = {}) {|page| ... } ⇒ Object
GET the uri
with the given request parameters
, referer
and headers
The referer
may be a URI or a page.
is formatted into a query string using Mechanize::Util.build_query_string, which see.
460 461 462 463 464 465 466 467 468 469 470 471 472 473 474 475 476 477 478 479 480 481 482 483 484 485 486 487 |
# File 'lib/mechanize.rb', line 460 def get(uri, parameters = [], referer = nil, headers = {}) method = :get referer ||= if uri.to_s =~ %r{\Ahttps?://} else current_page || end # FIXME: Huge hack so that using a URI as a referer works. I need to # refactor everything to pass around URIs but still support # Mechanize::Page#base unless Mechanize::Parser === referer then referer = if referer.is_a?(String) then URI(referer) else referer end end # fetch the page headers ||= {} page = @agent.fetch uri, method, headers, parameters, referer add_to_history(page) yield page if block_given? page end |
#get_file(url) ⇒ Object
GET url
and return only its contents
492 493 494 |
# File 'lib/mechanize.rb', line 492 def get_file(url) get(url).body end |
#gzip_enabled ⇒ Object
Is gzip compression of responses enabled?
818 819 820 |
# File 'lib/mechanize.rb', line 818 def gzip_enabled @agent.gzip_enabled end |
#gzip_enabled=(enabled) ⇒ Object
Disables HTTP/1.1 gzip compression (enabled by default)
825 826 827 |
# File 'lib/mechanize.rb', line 825 def gzip_enabled=enabled @agent.gzip_enabled = enabled end |
#head(uri, query_params = {}, headers = {}) {|page| ... } ⇒ Object
HEAD uri
with query_params
and headers
is formatted into a query string using Mechanize::Util.build_query_string, which see.
head('http://example/', {'q' => 'foo'}, {})
504 505 506 507 508 509 510 |
# File 'lib/mechanize.rb', line 504 def head(uri, query_params = {}, headers = {}) page = @agent.fetch uri, :head, headers, query_params yield page if block_given? page end |
#history ⇒ Object
The history of this mechanize run
266 267 268 |
# File 'lib/mechanize.rb', line 266 def history @agent.history end |
#idle_timeout ⇒ Object
Connections that have not been used in this many seconds will be reset.
832 833 834 |
# File 'lib/mechanize.rb', line 832 def idle_timeout @agent.idle_timeout end |
#idle_timeout=(idle_timeout) ⇒ Object
Sets the idle timeout to idle_timeout
. The default timeout is 5 seconds. If you experience “too many connection resets”, reducing this value may help.
840 841 842 |
# File 'lib/mechanize.rb', line 840 def idle_timeout= idle_timeout @agent.idle_timeout = idle_timeout end |
#ignore_bad_chunking ⇒ Object
When set to true mechanize will ignore an EOF during chunked transfer encoding so long as at least one byte was received. Be careful when enabling this as it may cause data loss.
Net::HTTP does not inform mechanize of where in the chunked stream the EOF occurred. Usually it is after the last-chunk but before the terminating CRLF (invalid termination) but it may occur earlier. In the second case your response body may be incomplete.
854 855 856 |
# File 'lib/mechanize.rb', line 854 def ignore_bad_chunking @agent.ignore_bad_chunking end |
#ignore_bad_chunking=(ignore_bad_chunking) ⇒ Object
When set to true mechanize will ignore an EOF during chunked transfer encoding. See ignore_bad_chunking for further details
862 863 864 |
# File 'lib/mechanize.rb', line 862 def ignore_bad_chunking= ignore_bad_chunking @agent.ignore_bad_chunking = ignore_bad_chunking end |
#keep_alive ⇒ Object
Are HTTP/1.1 keep-alive connections enabled?
869 870 871 |
# File 'lib/mechanize.rb', line 869 def keep_alive @agent.keep_alive end |
#keep_alive=(enable) ⇒ Object
Disable HTTP/1.1 keep-alive connections if enable
is set to false. If you are experiencing “too many connection resets” errors setting this to false will eliminate them.
You should first investigate reducing idle_timeout.
880 881 882 |
# File 'lib/mechanize.rb', line 880 def keep_alive= enable @agent.keep_alive = enable end |
#key ⇒ Object
An OpenSSL private key or the path to a private key
1190 1191 1192 |
# File 'lib/mechanize.rb', line 1190 def key @agent.private_key end |
#key=(key) ⇒ Object
Sets the OpenSSL client key
to the given path or key instance. If a path is given, the path must contain an RSA key file.
1198 1199 1200 |
# File 'lib/mechanize.rb', line 1198 def key= key @agent.private_key = key end |
#log ⇒ Object
The current logger. If no logger has been set Mechanize.log is used.
887 888 889 |
# File 'lib/mechanize.rb', line 887 def log @log || Mechanize.log end |
#log=(logger) ⇒ Object
Sets the logger
used by this instance of mechanize
894 895 896 |
# File 'lib/mechanize.rb', line 894 def log= logger @log = logger end |
#max_file_buffer ⇒ Object
Responses larger than this will be written to a Tempfile instead of stored in memory. The default is 100,000 bytes.
A value of nil disables creation of Tempfiles.
904 905 906 |
# File 'lib/mechanize.rb', line 904 def max_file_buffer @agent.max_file_buffer end |
#max_file_buffer=(bytes) ⇒ Object
Sets the maximum size of a response body that will be stored in memory to bytes
. A value of nil causes all response bodies to be stored in memory.
Note that for Mechanize::Download subclasses, the maximum buffer size multiplied by the number of pages stored in history (controlled by #max_history) is an approximate upper limit on the amount of memory Mechanize will use. By default, Mechanize can use up to ~5MB to store response bodies for non-File and non-Page (HTML) responses.
See also the discussion under #max_history=
921 922 923 |
# File 'lib/mechanize.rb', line 921 def max_file_buffer= bytes @agent.max_file_buffer = bytes end |
#max_history ⇒ Object
Maximum number of items allowed in the history. The default setting is 50 pages. Note that the size of the history multiplied by the maximum response body size
275 276 277 |
# File 'lib/mechanize.rb', line 275 def max_history @agent.history.max_size end |
#max_history=(length) ⇒ Object
Sets the maximum number of items allowed in the history to length
Setting the maximum history length to nil will make the history size unlimited. Take care when doing this, mechanize stores response bodies in memory for pages and in the temporary files directory for other responses. For a long-running mechanize program this can be quite large.
See also the discussion under #max_file_buffer=
289 290 291 |
# File 'lib/mechanize.rb', line 289 def max_history= length @agent.history.max_size = length end |
#open_timeout ⇒ Object
Length of time to wait until a connection is opened in seconds
928 929 930 |
# File 'lib/mechanize.rb', line 928 def open_timeout @agent.open_timeout end |
#open_timeout=(open_timeout) ⇒ Object
Sets the connection open timeout to open_timeout
935 936 937 |
# File 'lib/mechanize.rb', line 935 def open_timeout= open_timeout @agent.open_timeout = open_timeout end |
#parse(uri, response, body) ⇒ Object
Parses the body
of the response
from uri
using the pluggable parser that matches its content type
1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 |
# File 'lib/mechanize.rb', line 1274 def parse uri, response, body content_type = nil unless response['Content-Type'].nil? data, = response['Content-Type'].split ';', 2 content_type, = data.downcase.split ',', 2 unless data.nil? end parser_klass = @pluggable_parser.parser content_type unless parser_klass <= Mechanize::Download then body = case body when IO, Tempfile, StringIO then else body end end uri, response, body, response.code do |parser| parser.mech = self if parser.respond_to? :mech= parser.watch_for_set = @watch_for_set if @watch_for_set and parser.respond_to?(:watch_for_set=) end end |
#pass ⇒ Object
OpenSSL client key password
1205 1206 1207 |
# File 'lib/mechanize.rb', line 1205 def pass @agent.pass end |
#pass=(pass) ⇒ Object
Sets the client key password to pass
1212 1213 1214 |
# File 'lib/mechanize.rb', line 1212 def pass= pass @agent.pass = pass end |
#post(uri, query = {}, headers = {}) ⇒ Object
POST to the given uri
with the given query
is processed using Mechanize::Util.each_parameter (which see), and then encoded into an entity body. If any IO/FileUpload object is specified as a field value the “enctype” will be multipart/form-data, or application/x-www-form-urlencoded otherwise.
Examples: '', "foo" => "bar" '', [%w[foo bar]]'', "<message>hello</message>",
'Content-Type' => 'application/xml')
529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 |
# File 'lib/mechanize.rb', line 529 def post(uri, query = {}, headers = {}) return request_with_entity(:post, uri, query, headers) if String === query node = {} # Create a fake form class << node def search(*args); []; end end node['method'] = 'POST' node['enctype'] = 'application/x-www-form-urlencoded' form = Mechanize::Util.each_parameter(query) { |k, v| if v.is_a?(IO) form.enctype = 'multipart/form-data' ul ={'name' => k.to_s},::File.basename(v.path)) ul.file_data = form.file_uploads << ul elsif v.is_a?(Form::FileUpload) form.enctype = 'multipart/form-data' form.file_uploads << v else form.fields <<{'name' => k.to_s},v) end } post_form(uri, form, headers) end |
#post_connect_hooks ⇒ Object
A list of hooks to call after retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
330 331 332 |
# File 'lib/mechanize.rb', line 330 def post_connect_hooks @agent.post_connect_hooks end |
#pre_connect_hooks ⇒ Object
A list of hooks to call before retrieving a response. Hooks are called with the agent, the URI, the response, and the response body.
338 339 340 |
# File 'lib/mechanize.rb', line 338 def pre_connect_hooks @agent.pre_connect_hooks end |
#pretty_print(q) ⇒ Object
1301 1302 1303 1304 1305 1306 1307 1308 |
# File 'lib/mechanize.rb', line 1301 def pretty_print(q) # :nodoc: q.object_group(self) { q.breakable q.pp q.breakable q.pp current_page } end |
#put(uri, entity, headers = {}) ⇒ Object
PUT to uri
with entity
, and setting headers
put('http://example/', 'new content', {'Content-Type' => 'text/plain'})
563 564 565 |
# File 'lib/mechanize.rb', line 563 def put(uri, entity, headers = {}) request_with_entity(:put, uri, entity, headers) end |
#read_timeout ⇒ Object
Length of time to wait for data from the server
942 943 944 |
# File 'lib/mechanize.rb', line 942 def read_timeout @agent.read_timeout end |
#read_timeout=(read_timeout) ⇒ Object
Sets the timeout for each chunk of data read from the server to read_timeout
. A single request may read many chunks of data.
950 951 952 |
# File 'lib/mechanize.rb', line 950 def read_timeout= read_timeout @agent.read_timeout = read_timeout end |
#redirect_ok ⇒ Object Also known as: follow_redirect?
Controls how mechanize deals with redirects. The following values are allowed:
- :all, true
All 3xx redirects are followed (default)
- :permanent
Only 301 Moved Permanently redirects are followed
- false
No redirects are followed
977 978 979 |
# File 'lib/mechanize.rb', line 977 def redirect_ok @agent.redirect_ok end |
#redirect_ok=(follow) ⇒ Object Also known as: follow_redirect=
Sets the mechanize redirect handling policy. See redirect_ok for allowed values
987 988 989 |
# File 'lib/mechanize.rb', line 987 def redirect_ok= follow @agent.redirect_ok = follow end |
#redirection_limit ⇒ Object
Maximum number of redirections to follow
996 997 998 |
# File 'lib/mechanize.rb', line 996 def redirection_limit @agent.redirection_limit end |
#redirection_limit=(limit) ⇒ Object
Sets the maximum number of redirections to follow to limit
1003 1004 1005 |
# File 'lib/mechanize.rb', line 1003 def redirection_limit= limit @agent.redirection_limit = limit end |
#request_headers ⇒ Object
A hash of custom request headers that will be sent on every request
1016 1017 1018 |
# File 'lib/mechanize.rb', line 1016 def request_headers @agent.request_headers end |
#request_headers=(request_headers) ⇒ Object
Replaces the custom request headers that will be sent on every request with request_headers
1024 1025 1026 |
# File 'lib/mechanize.rb', line 1024 def request_headers= request_headers @agent.request_headers = request_headers end |
#request_with_entity(verb, uri, entity, headers = {}) ⇒ Object
Makes an HTTP request to url
using HTTP method verb
. entity
is used as the request body, if allowed.
571 572 573 574 575 576 577 578 579 580 581 582 583 584 |
# File 'lib/mechanize.rb', line 571 def request_with_entity(verb, uri, entity, headers = {}) cur_page = current_page || log.debug("query: #{ entity.inspect }") if log headers = { 'Content-Type' => 'application/octet-stream', 'Content-Length' => entity.size.to_s, }.update headers page = @agent.fetch uri, verb, headers, [entity], cur_page add_to_history(page) page end |
#reset ⇒ Object
Clears history and cookies.
1325 1326 1327 |
# File 'lib/mechanize.rb', line 1325 def reset @agent.reset end |
#resolve(link) ⇒ Object
Resolve the full path of a link / uri
1009 1010 1011 |
# File 'lib/mechanize.rb', line 1009 def resolve link @agent.resolve link end |
#retry_change_requests ⇒ Object
Retry POST and other non-idempotent requests. See RFC 2616 9.1.2.
1031 1032 1033 |
# File 'lib/mechanize.rb', line 1031 def retry_change_requests @agent.retry_change_requests end |
#retry_change_requests=(retry_change_requests) ⇒ Object
When setting retry_change_requests
to true you are stating that, for all the URLs you access with mechanize, making POST and other non-idempotent requests is safe and will not cause data duplication or other harmful results.
If you are experiencing “too many connection resets” errors you should instead investigate reducing the idle_timeout or disabling keep_alive connections.
1045 1046 1047 |
# File 'lib/mechanize.rb', line 1045 def retry_change_requests= retry_change_requests @agent.retry_change_requests = retry_change_requests end |
#robots ⇒ Object
Will /robots.txt
files be obeyed?
1052 1053 1054 |
# File 'lib/mechanize.rb', line 1052 def robots @agent.robots end |
#robots=(enabled) ⇒ Object
When enabled
mechanize will retrieve and obey robots.txt
1060 1061 1062 |
# File 'lib/mechanize.rb', line 1060 def robots= enabled @agent.robots = enabled end |
#scheme_handlers ⇒ Object
The handlers for HTTP and other URI protocols.
1067 1068 1069 |
# File 'lib/mechanize.rb', line 1067 def scheme_handlers @agent.scheme_handlers end |
#scheme_handlers=(scheme_handlers) ⇒ Object
Replaces the URI scheme handler table with scheme_handlers
1074 1075 1076 |
# File 'lib/mechanize.rb', line 1074 def scheme_handlers= scheme_handlers @agent.scheme_handlers = scheme_handlers end |
#set_proxy(address, port, user = nil, password = nil) ⇒ Object
Sets the proxy address
at port
with an optional user
and password
1313 1314 1315 1316 1317 1318 1319 1320 |
# File 'lib/mechanize.rb', line 1313 def set_proxy address, port, user = nil, password = nil @proxy_addr = address @proxy_port = port @proxy_user = user @proxy_pass = password @agent.set_proxy address, port, user, password end |
#shutdown ⇒ Object
Shuts down this session by clearing browsing state and closing all persistent connections.
1333 1334 1335 1336 |
# File 'lib/mechanize.rb', line 1333 def shutdown reset @agent.shutdown end |
#ssl_version ⇒ Object
SSL version to use.
1219 1220 1221 |
# File 'lib/mechanize.rb', line 1219 def ssl_version @agent.ssl_version end |
#ssl_version=(ssl_version) ⇒ Object
Sets the SSL version to use to version
without client/server negotiation.
1227 1228 1229 |
# File 'lib/mechanize.rb', line 1227 def ssl_version= ssl_version @agent.ssl_version = ssl_version end |
#submit(form, button = nil, headers = {}) ⇒ Object
Submits form
with an optional button
Without a button:
page = agent.get('')
With a button:
agent.submit(page.forms.first, page.forms.first..first)
598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 |
# File 'lib/mechanize.rb', line 598 def submit(form, = nil, headers = {}) form.() if case form.method.upcase when 'POST' post_form(form.action, form, headers) when 'GET' get(form.action.gsub(/\?[^\?]*$/, ''), form.build_query,, headers) else raise ArgumentError, "unsupported method: #{form.method.upcase}" end end |
#transact ⇒ Object
Runs given block, then resets the page history as it was before. self is given as a parameter to the block. Returns the value of the block.
618 619 620 621 622 623 624 625 |
# File 'lib/mechanize.rb', line 618 def transact history_backup = @agent.history.dup begin yield self ensure @agent.history = history_backup end end |
#user_agent ⇒ Object
The identification string for the client initiating a web request
1081 1082 1083 |
# File 'lib/mechanize.rb', line 1081 def user_agent @agent.user_agent end |
#user_agent=(user_agent) ⇒ Object
Sets the User-Agent used by mechanize to user_agent
. See also user_agent_alias
1089 1090 1091 |
# File 'lib/mechanize.rb', line 1089 def user_agent= user_agent @agent.user_agent = user_agent end |
#user_agent_alias=(name) ⇒ Object
Set the user agent for the Mechanize object based on the given name
1098 1099 1100 1101 |
# File 'lib/mechanize.rb', line 1098 def user_agent_alias= name self.user_agent = AGENT_ALIASES[name] || raise(ArgumentError, "unknown agent alias #{name.inspect}") end |
#verify_callback ⇒ Object
A callback for additional certificate verification. See OpenSSL::SSL::SSLContext#verify_callback
The callback can be used for debugging or to ignore errors by always returning true
. Specifying nil uses the default method that was valid when the SSLContext was created
1239 1240 1241 |
# File 'lib/mechanize.rb', line 1239 def verify_callback @agent.verify_callback end |
#verify_callback=(verify_callback) ⇒ Object
Sets the OpenSSL certificate verification callback
1246 1247 1248 |
# File 'lib/mechanize.rb', line 1246 def verify_callback= verify_callback @agent.verify_callback = verify_callback end |
#verify_mode ⇒ Object
the OpenSSL server certificate verification method. The default is OpenSSL::SSL::VERIFY_PEER and certificate verification uses the default system certificates. See also cert_store
1255 1256 1257 |
# File 'lib/mechanize.rb', line 1255 def verify_mode @agent.verify_mode end |
#verify_mode=(verify_mode) ⇒ Object
Sets the OpenSSL server certificate verification method.
1262 1263 1264 |
# File 'lib/mechanize.rb', line 1262 def verify_mode= verify_mode @agent.verify_mode = verify_mode end |
#visited?(url) ⇒ Boolean Also known as: visited_page
Returns a visited page for the url
passed in, otherwise nil
296 297 298 299 300 |
# File 'lib/mechanize.rb', line 296 def visited? url url = url.href if url.respond_to? :href @agent.visited_page url end |
#write_timeout ⇒ Object
Length of time to wait for data to be sent to the server
957 958 959 |
# File 'lib/mechanize.rb', line 957 def write_timeout @agent.write_timeout end |
#write_timeout=(write_timeout) ⇒ Object
Sets the timeout for each chunk of data to be sent to the server to write_timeout
. A single request may write many chunks of data.
965 966 967 |
# File 'lib/mechanize.rb', line 965 def write_timeout= write_timeout @agent.write_timeout = write_timeout end |