Class: CraigScrape
- Inherits:
-
Object
- Object
- CraigScrape
- Defined in:
- lib/libcraigscrape.rb,
lib/geo_listings.rb
Overview
A base class encapsulating the various libcraigscrape objects, and providing most of the craigslist interaction methods. Currently, we’re supporting the old Class methods in a legacy-compatibility mode, but these methods are marked for deprecation. Instead, create an instance of the Craigslist object, and use its Public Instance methods. See the README for easy to follow examples.
Defined Under Namespace
Classes: GeoListings, Listings, Posting, Scraper
Class Method Summary collapse
-
.scrape_full_post(post_url) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Posting.new.
-
.scrape_listing(listing_url) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Listings.new.
-
.scrape_posts(listing_url, count) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.
-
.scrape_posts_since(listing_url, newer_then) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::posts_since method.
-
.scrape_until(listing_url, &post_condition) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.
Instance Method Summary collapse
-
#each_listing(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
-
#each_page_in_each_listing(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
-
#each_post(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
-
#initialize(*args) ⇒ CraigScrape
constructor
Takes a variable number of site/path specifiers (strings) as an argument.
-
#listings(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
-
#posts(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
-
#posts_since(newer_then, *fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
-
#sites ⇒ Object
Returns which sites are included in any operations performed by this object.
Constructor Details
#initialize(*args) ⇒ CraigScrape
Takes a variable number of site/path specifiers (strings) as an argument. This list gets flattened and passed to CraigScrape::GeoListings.find_sites . See that method’s rdoc for a complete set of rules on what arguments are allowed here.
36 37 38 |
# File 'lib/libcraigscrape.rb', line 36 def initialize(*args) @sites_specs = args.flatten end |
Class Method Details
.scrape_full_post(post_url) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Posting.new
Scrapes a single Post Url, and returns a Posting object representing its contents. Mostly here to preserve backwards-compatibility with the older api, CraigScrape::Listings.new “listing_url” does the same thing
165 166 167 |
# File 'lib/libcraigscrape.rb', line 165 def scrape_full_post(post_url) CraigScrape::Posting.new post_url end |
.scrape_listing(listing_url) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using CraigScrape::Listings.new
Scrapes a single listing url and returns a Listings object representing the contents. Mostly here to preserve backwards-compatibility with the older api, CraigScrape::Listings.new “listing_url” does the same thing
133 134 135 |
# File 'lib/libcraigscrape.rb', line 133 def scrape_listing(listing_url) CraigScrape::Listings.new listing_url end |
.scrape_posts(listing_url, count) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.
Continually scrapes listings, using the supplied url as a starting point, until ‘count’ summaries have been retrieved or no more ‘next page’ links are avialable to be clicked on. Returns an array of PostSummary objects.
174 175 176 177 |
# File 'lib/libcraigscrape.rb', line 174 def scrape_posts(listing_url, count) count_so_far = 0 self.scrape_until(listing_url) {|post| count_so_far+=1; count < count_so_far } end |
.scrape_posts_since(listing_url, newer_then) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::posts_since method.
Continually scrapes listings, until the date newer_then has been reached, or no more ‘next page’ links are avialable to be clicked on. Returns an array of PostSummary objects. Dates are based on the Month/Day ‘datestamps’ reported in the listing summaries. As such, time-based cutoffs are not supported here. The scrape_until method, utilizing the SummaryPost.full_post method could achieve time-based cutoffs, at the expense of retrieving every post in full during enumerations.
Note: The results will not include post summaries having the newer_then date themselves.
188 189 190 |
# File 'lib/libcraigscrape.rb', line 188 def scrape_posts_since(listing_url, newer_then) self.scrape_until(listing_url) {|post| post.post_date <= newer_then} end |
.scrape_until(listing_url, &post_condition) ⇒ Object
This method is for legacy compatibility and is not recommended for use by new projects. Instead, consider using the CraigScrape::each_post method.
Continually scrapes listings, using the supplied url as a starting point, until the supplied block returns true or until there’s no more ‘next page’ links available to click on
142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 |
# File 'lib/libcraigscrape.rb', line 142 def scrape_until(listing_url, &post_condition) ret = [] listings = CraigScrape::Listings.new listing_url catch "ScrapeBreak" do while listings do listings.posts.each do |post| throw "ScrapeBreak" if post_condition.call(post) ret << post end listings = listings.next_page end end ret end |
Instance Method Details
#each_listing(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
Passes the first page listing of each of these urls to the provided block.
51 52 53 |
# File 'lib/libcraigscrape.rb', line 51 def each_listing(*fragments) listing_urls_for(fragments).each{|url| yield Listings.new(url) } end |
#each_page_in_each_listing(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
Passes each page on every listing for the passed URLs to the provided block.
59 60 61 62 63 64 65 66 |
# File 'lib/libcraigscrape.rb', line 59 def each_page_in_each_listing(*fragments) each_listing(*fragments) do |listing| while listing yield listing listing = listing.next_page end end end |
#each_post(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
Passes all posts from each of these urls to the provided block, in the order they’re parsed (for each listing, newest posts are returned first).
81 82 83 |
# File 'lib/libcraigscrape.rb', line 81 def each_post(*fragments) each_page_in_each_listing(*fragments){ |l| l.posts.each{|p| yield p} } end |
#listings(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
Returns the first page listing of each of these urls to the provided block.
72 73 74 |
# File 'lib/libcraigscrape.rb', line 72 def listings(*fragments) listing_urls_for(fragments).collect{|url| Listings.new url } end |
#posts(*fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
Returns all posts from each of these urls, in the order they’re parsed (newest posts first).
90 91 92 93 94 |
# File 'lib/libcraigscrape.rb', line 90 def posts(*fragments) ret = [] each_page_in_each_listing(*fragments){ |l| ret += l.posts } ret end |
#posts_since(newer_then, *fragments) ⇒ Object
Determines all listings which can be construed by combining the sites specified in the object constructor with the provided url-path fragments.
Returns all posts from each of these urls, which are newer than (or equal to) the provider ‘newer_then’ date. (Returns ‘newest’ posts first).
NOTE: New to version 1.1, if newer_then is a date, we compare to the post_date if newer_then is a Time, we compare to post_time. Be aware that post_time requires the entire post be loaded, and not just the summary - which will take longer to download.
106 107 108 109 110 111 112 113 114 115 116 117 118 119 |
# File 'lib/libcraigscrape.rb', line 106 def posts_since(newer_then, *fragments) accessor = (newer_then.kind_of? Date) ? :post_date : :post_time ret = [] fragments.each do |frag| each_post(frag) do |p| # We have to try the comparison, since post_time could conceivably be nil # for the case of a system_post? break if p.send(accessor).try(:<=, newer_then) ret << p end end ret end |
#sites ⇒ Object
Returns which sites are included in any operations performed by this object. This is directly ascertained from the initial constructor’s spec-list
42 43 44 45 |
# File 'lib/libcraigscrape.rb', line 42 def sites @sites ||= GeoListings.find_sites @sites_specs @sites end |