Class: WaybackArchiver::Sitemap
- Inherits:
-
Object
- Object
- WaybackArchiver::Sitemap
- Defined in:
- lib/wayback_archiver/sitemap.rb
Overview
Parse Sitemaps, www.sitemaps.org
Instance Attribute Summary collapse
-
#document ⇒ Object
readonly
Returns the value of attribute document.
Instance Method Summary collapse
-
#initialize(xml_or_string, strict: false) ⇒ Sitemap
constructor
A new instance of Sitemap.
-
#plain_document? ⇒ Boolean
Check if sitemap is a plain file.
-
#root_name ⇒ String
Return the name of the document (if there is one).
-
#sitemap_index? ⇒ Boolean
Returns true of Sitemap is a Sitemap index.
-
#sitemaps ⇒ Array<String>
Return all sitemap URLs defined in Sitemap.
-
#urls ⇒ Array<String>
Return all URLs defined in Sitemap.
-
#urlset? ⇒ Boolean
Returns true of Sitemap lists regular URLs.
Constructor Details
#initialize(xml_or_string, strict: false) ⇒ Sitemap
Returns a new instance of Sitemap.
9 10 11 12 13 14 15 16 |
# File 'lib/wayback_archiver/sitemap.rb', line 9 def initialize(xml_or_string, strict: false) @contents = xml_or_string @document = REXML::Document.new(xml_or_string) rescue REXML::ParseException => _e raise if strict @document = REXML::Document.new('') end |
Instance Attribute Details
#document ⇒ Object (readonly)
Returns the value of attribute document.
7 8 9 |
# File 'lib/wayback_archiver/sitemap.rb', line 7 def document @document end |
Instance Method Details
#plain_document? ⇒ Boolean
Check if sitemap is a plain file
38 39 40 |
# File 'lib/wayback_archiver/sitemap.rb', line 38 def plain_document? document.elements.empty? end |
#root_name ⇒ String
Return the name of the document (if there is one)
44 45 46 47 48 |
# File 'lib/wayback_archiver/sitemap.rb', line 44 def root_name return unless document.root document.root.name end |
#sitemap_index? ⇒ Boolean
Returns true of Sitemap is a Sitemap index
55 56 57 |
# File 'lib/wayback_archiver/sitemap.rb', line 55 def sitemap_index? root_name == 'sitemapindex' end |
#sitemaps ⇒ Array<String>
Return all sitemap URLs defined in Sitemap.
32 33 34 |
# File 'lib/wayback_archiver/sitemap.rb', line 32 def sitemaps @sitemaps ||= extract_urls('sitemap') end |
#urls ⇒ Array<String>
Return all URLs defined in Sitemap.
23 24 25 |
# File 'lib/wayback_archiver/sitemap.rb', line 23 def urls @urls ||= extract_urls('url') end |
#urlset? ⇒ Boolean
Returns true of Sitemap lists regular URLs
64 65 66 |
# File 'lib/wayback_archiver/sitemap.rb', line 64 def urlset? root_name == 'urlset' end |