Class: Ripli::CustomParserTemplate

Inherits:

CustomParser

Object
CustomParser
Ripli::CustomParserTemplate

show all

Defined in:: lib/ripli/customparser_template.rb

Overview

class should be inherited from CustomParser class name should be related with sitename

Constant Summary collapse

CONSTANT = from superclass you inherit constants: LOG_DIR = ‘log’ -> directory to save files with proxies DEFAULT_MAX_TIMEOUT = 1000 -> max timeout of proxy response in ms

'Your constants'

Constants inherited from CustomParser

Ripli::CustomParser::DEFAULT_MAX_TIMEOUT, Ripli::CustomParser::DEFAULT_MECHANIZE_TIMEOUT, Ripli::CustomParser::LOG_DIR

Instance Method Summary collapse

#initialize ⇒ CustomParserTemplate constructor

define it if you need initialize some instance variables or perform some preparations (creating directories, etc).
#parse(type, opts = {}) ⇒ Object

required method! logic of scraping site must be here type – proxy type: [:https, :socks4, :socks5] opts – additional params if you need return – array of stings in format: “<type>t<ip>tt<port>”.

Methods inherited from CustomParser

#shell_exec!

Constructor Details

#initialize ⇒ `CustomParserTemplate`

define it if you need initialize some instance variables or perform some preparations (creating directories, etc)

# File 'lib/ripli/customparser_template.rb', line 18

def initialize
  super # required for creating logger and directory
  # define @mechanize = Mechanize.new { |agent| agent.open_timeout...} if you need add some options to mechanize agent
  # your code here
end

Instance Method Details

#parse(type, opts = {}) ⇒ `Object`

required method! logic of scraping site must be here type – proxy type: [:https, :socks4, :socks5] opts – additional params if you need return – array of stings in format: “<type>t<ip>tt<port>”

# File 'lib/ripli/customparser_template.rb', line 29

def parse(type, opts = {})
  []
  # for downloading use @mechanize.get(url)
  @logger.info 'Use @logger for print logs in STDOUT'
rescue Net::OpenTimeout, Net::ReadTimeout
  # rescue exception during downloading page, DEFAULT_MECHANIZE_TIMEOUT=10s
end