Class: Graybook::Importer::PageScraper

Inherits:
Base show all
Defined in:
lib/graybook/importer/page_scraper.rb

Overview

A base class for importers that scrape their contacts from web services

Direct Known Subclasses

Aol, Gmail, Hotmail, Yahoo

Instance Attribute Summary collapse

Attributes inherited from Base

#options

Instance Method Summary collapse

Methods inherited from Base

#=~, #import, #service_name

Instance Attribute Details

#agentObject

Returns the value of attribute agent.



44
45
46
# File 'lib/graybook/importer/page_scraper.rb', line 44

def agent
  @agent
end

Instance Method Details

#create_agentObject

creates the Mechanize agent used to do the scraping and sets a nice user agent header for good net educate



50
51
52
53
54
55
# File 'lib/graybook/importer/page_scraper.rb', line 50

def create_agent
  self.agent = WWW::Mechanize.new
  agent.user_agent = "Mozilla/4.0 (compatible; Graybook #{Graybook::VERSION})"
  agent.keep_alive = false
  agent
end

#fetch_contacts!Object

Page scrapers will follow a fairly simple pattern of instantiating the agent, prepping for the scrape and then the actual scrape process



61
62
63
64
65
66
# File 'lib/graybook/importer/page_scraper.rb', line 61

def fetch_contacts!
  create_agent
  prep = prepare
  return prep if prep.nil?
  scrape_contacts
end

#prepareObject

Providers will often require you to login or otherwise prepare to actual scrape the contacts



72
# File 'lib/graybook/importer/page_scraper.rb', line 72

def prepare; end

#scrape_contactsObject

Some providers have a single page you can scrape from (like Gmail’s HTML Contacts page) while others might require you to navigate several pages, scraping as you go.



79
# File 'lib/graybook/importer/page_scraper.rb', line 79

def scrape_contacts; end

#strip_html(html) ⇒ Object

helper to strip html from text



84
85
86
# File 'lib/graybook/importer/page_scraper.rb', line 84

def strip_html( html )
  html.gsub(/<\/?[^>]*>/, '')
end