1.4.2 /2012-04-12
0.4.1 / 2011-12-08
- Catch
OpenSSL::SSL::SSLError
exceptions when initiated HTTPS Sessions.
0.4.0 / 2011-08-07
0.3.2 / 2011-06-20
0.3.1 / 2011-04-22
- Require
set
in spidr/headers.rb
.
0.3.0 / 2011-04-14
0.2.7 / 2010-08-17
- Added Spidr::CookieJar#cookies_for_host (thanks zapnap).
- Renamed
Spidr::Page#cookie
to Spidr::Page#raw_cookie
.
- Rescue
URI::InvalidComponentError
exceptions in
Spidr::Page#to_absolute
(thanks zapnap).
0.2.6 / 2010-07-05
- Fixed a bug in
Spidr::Page#meta_redirect
, by calling
Nokogiri::XML::Element#get_attribute
instead of attr
.
0.2.5 / 2010-07-02
- Added
Spidr::Page#meta_redirect
.
- Added
Spidr::Page#meta_redirect?
.
- Manage development dependencies with Bundler.
- Support following "old-school" meta-refresh redirects (thanks zapnap).
- Allow Spidr::CookieJar inherit cookies set by a parent domain.
- Fixed a constant lookup issue in Spidr::Agent.
- Use
yield
instead of block.call
when necessary.
0.2.4 / 2010-05-05
0.2.3 / 2010-02-27
0.2.2 / 2010-01-06
0.2.1 / 2009-11-25
0.2.0 / 2009-10-10
- Added URI.expand_path.
- Added
Spidr::Page#search
.
- Added
Spidr::Page#at
.
- Added
Spidr::Page#title
.
- Added Spidr::Agent#failures=.
- Added a HTTP session cache to Spidr::Agent, per suggestion of falter.
- Added
Spidr::Agent#get_session
.
- Added
Spidr::Agent#kill_session
.
- Added Spidr.proxy=.
- Added Spidr.disable_proxy!.
- Aliased
Spidr::Page#txt?
to Spidr::Page#plain_text?
.
- Aliased
Spidr::Page#ok?
to Spidr::Page#is_ok?
.
- Aliased
Spidr::Page#redirect?
to Spidr::Page#is_redirect?
.
- Aliased
Spidr::Page#unauthorized?
to Spidr::Page#is_unauthorized?
.
- Aliased
Spidr::Page#forbidden?
to Spidr::Page#is_forbidden?
.
- Aliased
Spidr::Page#missing?
to Spidr::Page#is_missing?
.
- Split URL filtering code out of Spidr::Agent and into
Spidr::Filters.
- Split URL / Page event code out of Spidr::Agent and into
Spidr::Events.
- Split pause! / continue! / skip_link! / skip_page! methods out of
Spidr::Agent and into Spidr::Actions.
- Fixed a bug in
Spidr::Page#code
, where it was not returning an Integer.
- Make sure
Spidr::Page#doc
returns Nokogiri::XML::Document
objects for
RSS/RDF/Atom pages as well.
- Fixed the handling of the Location header in
Spidr::Page#links
(thanks falter).
- Fixed a bug in
Spidr::Page#to_absolute
where trailing /
characters on
URI paths were not being preserved (thanks falter).
- Fixed a bug where the URI query was not being sent with the request
in Spidr::Agent#get_page (thanks Damian Steer).
- Fixed a bug where SSL sessions were not being properly setup
(thanks falter).
- Switched Spidr::Agent#history to be a Set, to improve search-time
of the history (thanks falter).
- Switched Spidr::Agent#failures to a Set.
- Allow a block to be passed to Spidr::Agent#run, which will receive all
pages visited.
- Allow
Spidr::Agent#start_at
and Spidr::Agent#continue!
to pass blocks
to Spidr::Agent#run.
- Made Spidr::Agent#visit_page public.
- Moved to YARD based documentation.
0.1.9 / 2009-06-13
- Upgraded to Hoe 2.0.0.
- Use Hoe.spec instead of Hoe.new.
- Use the Hoe signing task for signed gems.
- Added the
Spidr::Agent#schemes
and Spidr::Agent#schemes=
methods.
- Added a warning message if 'net/https' cannot be loaded.
- Allow the list of acceptable URL schemes to be passed into
Spidr::Agent#initialize.
- Allow history and queue information to be passed into
Spidr::Agent#initialize.
- Spidr::Agent#start_at no longer clears the history or the queue.
- Fixed a bug in the sanitization of semi-escaped URLs.
- Fixed a bug where https URLs would be followed even if 'net/https'
could not be loaded.
- Removed Spidr::Agent::SCHEMES.
0.1.8 / 2009-05-27
0.1.7 / 2009-04-24
- Added
Spidr::Agent#all_headers
.
- Fixed a bug where Spidr::Page#headers was always
nil
.
- Spidr::Agent will now follow the Location header in HTTP 300,
301, 302, 303 and 307 Redirects.
- Spidr::Agent will now follow iframe and frame tags.
0.1.6 / 2009-04-14
- Added Spidr::Agent#failures, a list of URLs which could not be visited.
- Added Spidr::Agent#failed?.
- Added
Spidr::Agent#every_failed_url
.
- Added Spidr::Agent#clear, which clears the history and failures URL
lists.
- Improved fault tolerance in Spidr::Agent#get_page.
- If a Network or HTTP error is encountered, the URL will be added to
the failures list and the next URL will be visited.
- Fixed a typo in
Spidr::Agent#ignore_exts_like
.
- Updated the Web Spider Obstacle Course with links that always fail to be
visited.
0.1.5 / 2009-03-22
- Catch malformed URIs in
Spidr::Page#to_absolute
and return nil
.
- Filter out
nil
URIs in Spidr::Page#urls
.
0.1.4 / 2009-01-15
- Use Nokogiri for HTML and XML parsing.
0.1.3 / 2009-01-10
0.1.2 / 2008-11-06
- Fixed a bug in
Spidr::Page#to_absolute
where URLs with no path were not
receiving a default path of /
.
- Fixed a bug in
Spidr::Page#to_absolute
where URL paths were not being
expanded, in order to remove ..
and .
directories.
- Fixed a bug where absolute URLs could have a blank path, thus causing
Spidr::Agent#get_page to crash when it performed the HTTP request.
- Added RSpec spec tests.
- Created a Web-Spider Obstacle Course
(http://spidr.rubyforge.org/course/start.html) which is used in the spec
tests.
0.1.1 / 2008-10-04
0.1.0 / 2008-05-23
- Initial release.
- Black-list or white-list URLs based upon:
- Host name
- Port number
- Full link
- URL extension
- Provides call-backs for:
- Every visited Page.
- Every visited URL.
- Every visited URL that matches a specified pattern.