Class: ScraperUtils::MechanizeUtils::AgentConfig
- Inherits:
-
Object
- Object
- ScraperUtils::MechanizeUtils::AgentConfig
- Defined in:
- lib/scraper_utils/mechanize_utils/agent_config.rb
Overview
Configuration for a Mechanize agent with sensible defaults and configurable settings. Supports global configuration through AgentConfig.configure and per-instance overrides.
Constant Summary collapse
- DEFAULT_TIMEOUT =
60
Class Attribute Summary collapse
-
.default_australian_proxy ⇒ Boolean
Default flag for Australian proxy preference.
-
.default_disable_ssl_certificate_check ⇒ Boolean
Default setting for SSL certificate verification.
-
.default_timeout ⇒ Integer
Default timeout in seconds for agent connections.
-
.default_user_agent ⇒ String?
Default Mechanize user agent.
Instance Attribute Summary collapse
-
#max_load ⇒ Object
readonly
Give access for testing.
-
#random_range ⇒ Object
readonly
Give access for testing.
-
#user_agent ⇒ String
readonly
User agent string.
Class Method Summary collapse
-
.configure {|self| ... } ⇒ void
Configure default settings for all AgentConfig instances.
-
.reset_defaults! ⇒ void
Reset all configuration options to their default values.
Instance Method Summary collapse
-
#configure_agent(agent) ⇒ void
Configures a Mechanize agent with these settings.
-
#initialize(timeout: nil, compliant_mode: nil, random_delay: nil, max_load: nil, disable_ssl_certificate_check: nil, australian_proxy: nil, user_agent: nil) ⇒ AgentConfig
constructor
Creates Mechanize agent configuration with sensible defaults overridable via configure.
Constructor Details
#initialize(timeout: nil, compliant_mode: nil, random_delay: nil, max_load: nil, disable_ssl_certificate_check: nil, australian_proxy: nil, user_agent: nil) ⇒ AgentConfig
Creates Mechanize agent configuration with sensible defaults overridable via configure
77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 77 def initialize(timeout: nil, compliant_mode: nil, random_delay: nil, max_load: nil, disable_ssl_certificate_check: nil, australian_proxy: nil, user_agent: nil) @timeout = timeout.nil? ? self.class.default_timeout : timeout @user_agent = user_agent.nil? ? self.class.default_user_agent : user_agent @disable_ssl_certificate_check = if disable_ssl_certificate_check.nil? self.class.default_disable_ssl_certificate_check else disable_ssl_certificate_check end @australian_proxy = if australian_proxy.nil? self.class.default_australian_proxy else australian_proxy end # Validate proxy URL format if proxy will be used @australian_proxy &&= !ScraperUtils.australian_proxy.to_s.empty? if @australian_proxy uri = begin URI.parse(ScraperUtils.australian_proxy.to_s) rescue URI::InvalidURIError => e raise URI::InvalidURIError, "Invalid proxy URL format: #{e}" end unless uri.is_a?(URI::HTTP) || uri.is_a?(URI::HTTPS) raise URI::InvalidURIError, "Proxy URL must start with http:// or https://" end unless !uri.host.to_s.empty? && uri.port&.positive? raise URI::InvalidURIError, "Proxy URL must include host and port" end end if @random_delay&.positive? min_random = Math.sqrt(@random_delay * 3.0 / 13.0) @random_range = min_random.round(3)..(3 * min_random).round(3) end today = Date.today.strftime("%Y-%m-%d") @user_agent = ENV.fetch("MORPH_USER_AGENT", nil)&.sub("TODAY", today) version = ScraperUtils::VERSION @user_agent ||= "Mozilla/5.0 (compatible; ScraperUtils/#{version} #{today}; +https://github.com/ianheggie-oaf/scraper_utils)" end |
Class Attribute Details
.default_australian_proxy ⇒ Boolean
Returns Default flag for Australian proxy preference.
36 37 38 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 36 def default_australian_proxy @default_australian_proxy end |
.default_disable_ssl_certificate_check ⇒ Boolean
Returns Default setting for SSL certificate verification.
33 34 35 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 33 def default_disable_ssl_certificate_check @default_disable_ssl_certificate_check end |
.default_timeout ⇒ Integer
Returns Default timeout in seconds for agent connections.
30 31 32 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 30 def default_timeout @default_timeout end |
.default_user_agent ⇒ String?
Returns Default Mechanize user agent.
39 40 41 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 39 def default_user_agent @default_user_agent end |
Instance Attribute Details
#max_load ⇒ Object (readonly)
Give access for testing
70 71 72 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 70 def max_load @max_load end |
#random_range ⇒ Object (readonly)
Give access for testing
70 71 72 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 70 def random_range @random_range end |
#user_agent ⇒ String (readonly)
Returns User agent string.
66 67 68 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 66 def user_agent @user_agent end |
Class Method Details
.configure {|self| ... } ⇒ void
This method returns an undefined value.
Configure default settings for all AgentConfig instances
48 49 50 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 48 def configure yield self if block_given? end |
.reset_defaults! ⇒ void
This method returns an undefined value.
Reset all configuration options to their default values
54 55 56 57 58 59 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 54 def reset_defaults! @default_timeout = ENV.fetch('MORPH_CLIENT_TIMEOUT', DEFAULT_TIMEOUT).to_i # 60 @default_disable_ssl_certificate_check = !ENV.fetch('MORPH_DISABLE_SSL_CHECK', nil).to_s.empty? # false @default_australian_proxy = !ENV.fetch('MORPH_USE_PROXY', nil).to_s.empty? # false @default_user_agent = ENV.fetch('MORPH_USER_AGENT', nil) # Uses Mechanize user agent end |
Instance Method Details
#configure_agent(agent) ⇒ void
This method returns an undefined value.
Configures a Mechanize agent with these settings
130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 |
# File 'lib/scraper_utils/mechanize_utils/agent_config.rb', line 130 def configure_agent(agent) agent.verify_mode = OpenSSL::SSL::VERIFY_NONE if @disable_ssl_certificate_check if @timeout agent.open_timeout = @timeout agent.read_timeout = @timeout end agent.user_agent = user_agent agent.request_headers ||= {} agent.request_headers["Accept"] = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8" agent.request_headers["Upgrade-Insecure-Requests"] = "1" if @australian_proxy agent.agent.set_proxy(ScraperUtils.australian_proxy) agent.request_headers["Accept-Language"] = "en-AU,en-US;q=0.9,en;q=0.8" verify_proxy_works(agent) end agent.pre_connect_hooks << method(:pre_connect_hook) agent.post_connect_hooks << method(:post_connect_hook) end |