Class: Blackbook::Importer::PageScraper
- Defined in:
- lib/blackbook/importer/page_scraper.rb
Overview
A base class for importers that scrape their contacts from web services
Instance Attribute Summary collapse
-
#agent ⇒ Object
Returns the value of attribute agent.
Attributes inherited from Base
Instance Method Summary collapse
-
#create_agent ⇒ Object
creates the Mechanize agent used to do the scraping and sets a nice user agent header for good net educate.
-
#fetch_contacts! ⇒ Object
Page scrapers will follow a fairly simple pattern of instantiating the agent, prepping for the scrape and then the actual scrape process.
-
#prepare ⇒ Object
Providers will often require you to login or otherwise prepare to actual scrape the contacts.
-
#scrape_contacts ⇒ Object
Some providers have a single page you can scrape from (like Gmail’s HTML Contacts page) while others might require you to navigate several pages, scraping as you go.
-
#strip_html(html) ⇒ Object
helper to strip html from text.
Methods inherited from Base
Instance Attribute Details
#agent ⇒ Object
Returns the value of attribute agent.
44 45 46 |
# File 'lib/blackbook/importer/page_scraper.rb', line 44 def agent @agent end |
Instance Method Details
#create_agent ⇒ Object
creates the Mechanize agent used to do the scraping and sets a nice user agent header for good net educate
50 51 52 53 54 55 |
# File 'lib/blackbook/importer/page_scraper.rb', line 50 def create_agent self.agent = WWW::Mechanize.new agent.user_agent = "Mozilla/4.0 (compatible; Blackbook #{Blackbook::VERSION})" agent.keep_alive = false agent end |
#fetch_contacts! ⇒ Object
Page scrapers will follow a fairly simple pattern of instantiating the agent, prepping for the scrape and then the actual scrape process
61 62 63 64 65 |
# File 'lib/blackbook/importer/page_scraper.rb', line 61 def fetch_contacts! create_agent prepare scrape_contacts end |
#prepare ⇒ Object
Providers will often require you to login or otherwise prepare to actual scrape the contacts
71 |
# File 'lib/blackbook/importer/page_scraper.rb', line 71 def prepare; end |
#scrape_contacts ⇒ Object
Some providers have a single page you can scrape from (like Gmail’s HTML Contacts page) while others might require you to navigate several pages, scraping as you go.
78 |
# File 'lib/blackbook/importer/page_scraper.rb', line 78 def scrape_contacts; end |
#strip_html(html) ⇒ Object
helper to strip html from text
83 84 85 |
# File 'lib/blackbook/importer/page_scraper.rb', line 83 def strip_html( html ) html.gsub(/<\/?[^>]*>/, '') end |