Class: Ripli::CustomParserTemplate

Inherits:
CustomParser show all
Defined in:
lib/ripli/customparser_template.rb

Overview

class should be inherited from CustomParser class name should be related with sitename

Constant Summary collapse

CONSTANT =

from superclass you inherit constants: LOG_DIR = ‘log’ -> directory to save files with proxies DEFAULT_MAX_TIMEOUT = 1000 -> max timeout of proxy response in ms

'Your constants'

Constants inherited from CustomParser

Ripli::CustomParser::DEFAULT_MAX_TIMEOUT, Ripli::CustomParser::DEFAULT_MECHANIZE_TIMEOUT, Ripli::CustomParser::LOG_DIR

Instance Method Summary collapse

Methods inherited from CustomParser

#shell_exec!

Constructor Details

#initializeCustomParserTemplate

define it if you need initialize some instance variables or perform some preparations (creating directories, etc)



18
19
20
21
22
# File 'lib/ripli/customparser_template.rb', line 18

def initialize
  super # required for creating logger and directory
  # define @mechanize = Mechanize.new { |agent| agent.open_timeout...} if you need add some options to mechanize agent
  # your code here
end

Instance Method Details

#parse(type, opts = {}) ⇒ Object

required method! logic of scraping site must be here type – proxy type: [:https, :socks4, :socks5] opts – additional params if you need return – array of stings in format: “<type>t<ip>tt<port>”



29
30
31
32
33
34
35
# File 'lib/ripli/customparser_template.rb', line 29

def parse(type, opts = {})
  []
  # for downloading use @mechanize.get(url)
  @logger.info 'Use @logger for print logs in STDOUT'
rescue Net::OpenTimeout, Net::ReadTimeout
  # rescue exception during downloading page, DEFAULT_MECHANIZE_TIMEOUT=10s
end