Class: CraigScrape

Inherits:

Object

Object
CraigScrape

show all

Defined in:: lib/libcraigscrape.rb,
lib/geo_listings.rb

Overview

A base class encapsulating the various libcraigscrape objects, and providing most of the craigslist interaction methods. Currently, we’re supporting the old Class methods in a legacy-compatibility mode, but these methods are marked for deprecation. Instead, create an instance of the Craigslist object, and use its Public Instance methods. See the README for easy to follow examples.

Defined Under Namespace

Classes: GeoListings, Listings, Posting, Scraper

Class Method Summary collapse

.scrape_full_post(post_url) ⇒ Object

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Posting.new.
.scrape_listing(listing_url) ⇒ Object

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Listings.new.
.scrape_posts(listing_url, count) ⇒ Object

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.
.scrape_posts_since(listing_url, newer_then) ⇒ Object

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::posts_since method.
.scrape_until(listing_url, &post_condition) ⇒ Object

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.

Instance Method Summary collapse

#each_listing(*fragments) ⇒ Object

Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
#each_page_in_each_listing(*fragments) ⇒ Object

Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
#each_post(*fragments) ⇒ Object

Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
#initialize(*args) ⇒ CraigScrape constructor

Takes a variable number of site/path specifiers (strings) as an argument.
#listings(*fragments) ⇒ Object

Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
#posts(*fragments) ⇒ Object

Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
#posts_since(newer_then, *fragments) ⇒ Object

Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
#sites ⇒ Object

Returns which sites are included in any operations performed by this object.

Constructor Details

#initialize(*args) ⇒ `CraigScrape`

Takes a variable number of site/path specifiers (strings) as an argument. This list gets flattened and passed to CraigScrape::GeoListings.find_sites . See that method’s rdoc for a complete set of rules on what arguments are allowed here.



38
39
40

# File 'lib/libcraigscrape.rb', line 38

def initialize(*args)
  @sites_specs = args.flatten
end

Class Method Details

.scrape_full_post(post_url) ⇒ `Object`

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Posting.new

Scrapes a single Post Url, and returns a Posting object representing its contents. Mostly here to preserve backwards-compatibility with the older api, CraigScrape::Listings.new “listing_url” does the same thing



159
160
161

# File 'lib/libcraigscrape.rb', line 159

def scrape_full_post(post_url)
  CraigScrape::Posting.new post_url
end

.scrape_listing(listing_url) ⇒ `Object`

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Listings.new

Scrapes a single listing url and returns a Listings object representing the contents. Mostly here to preserve backwards-compatibility with the older api, CraigScrape::Listings.new “listing_url” does the same thing



127
128
129

# File 'lib/libcraigscrape.rb', line 127

def scrape_listing(listing_url)
  CraigScrape::Listings.new listing_url
end

.scrape_posts(listing_url, count) ⇒ `Object`

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.

Continually scrapes listings, using the supplied url as a starting point, until ‘count’ summaries have been retrieved or no more ‘next page’ links are avialable to be clicked on. Returns an array of PostSummary objects.

# File 'lib/libcraigscrape.rb', line 168

def scrape_posts(listing_url, count)
  count_so_far = 0
  self.scrape_until(listing_url) {|post| count_so_far+=1; count < count_so_far }
end

.scrape_posts_since(listing_url, newer_then) ⇒ `Object`

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::posts_since method.

Continually scrapes listings, until the date newer_then has been reached, or no more ‘next page’ links are avialable to be clicked on. Returns an array of PostSummary objects. Dates are based on the Month/Day ‘datestamps’ reported in the listing summaries. As such, time-based cutoffs are not supported here. The scrape_until method, utilizing the SummaryPost.full_post method could achieve time-based cutoffs, at the expense of retrieving every post in full during enumerations.

Note: The results will not include post summaries having the newer_then date themselves.



182
183
184

# File 'lib/libcraigscrape.rb', line 182

def scrape_posts_since(listing_url, newer_then)
  self.scrape_until(listing_url) {|post| post.post_date <= newer_then}
end

.scrape_until(listing_url, &post_condition) ⇒ `Object`

This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.

Continually scrapes listings, using the supplied url as a starting point, until the supplied block returns true or until there’s no more ‘next page’ links available to click on

# File 'lib/libcraigscrape.rb', line 136

def scrape_until(listing_url, &post_condition)
  ret = []

  listings = CraigScrape::Listings.new listing_url
  catch "ScrapeBreak" do
    while listings do
      listings.posts.each do |post|
        throw "ScrapeBreak" if post_condition.call(post)
        ret << post
      end

      listings = listings.next_page
    end
  end

  ret
end