Class: WaybackArchiver::Sitemap
- Inherits:
-
Object
- Object
- WaybackArchiver::Sitemap
- Defined in:
- lib/wayback_archiver/sitemap.rb
Overview
Parse Sitemaps, www.sitemaps.org
Instance Attribute Summary collapse
-
#document ⇒ Object
readonly
Returns the value of attribute document.
Instance Method Summary collapse
-
#initialize(xml, strict: false) ⇒ Sitemap
constructor
A new instance of Sitemap.
-
#plain_document? ⇒ Boolean
Check if sitemap is a plain file.
-
#root_name ⇒ String
Return the name of the document (if there is one).
-
#sitemap_index? ⇒ Boolean
Returns true of Sitemap is a Sitemap index.
-
#sitemaps ⇒ Array<String>
Return all sitemap URLs defined in Sitemap.
-
#urls ⇒ Array<String>
Return all URLs defined in Sitemap.
-
#urlset? ⇒ Boolean
Returns true of Sitemap lists regular URLs.
Constructor Details
#initialize(xml, strict: false) ⇒ Sitemap
Returns a new instance of Sitemap.
8 9 10 11 12 13 14 |
# File 'lib/wayback_archiver/sitemap.rb', line 8 def initialize(xml, strict: false) @document = REXML::Document.new(xml) rescue REXML::ParseException => _e raise if strict @document = REXML::Document.new('') end |
Instance Attribute Details
#document ⇒ Object (readonly)
Returns the value of attribute document.
6 7 8 |
# File 'lib/wayback_archiver/sitemap.rb', line 6 def document @document end |
Instance Method Details
#plain_document? ⇒ Boolean
Check if sitemap is a plain file
36 37 38 |
# File 'lib/wayback_archiver/sitemap.rb', line 36 def plain_document? document.elements.empty? end |
#root_name ⇒ String
Return the name of the document (if there is one)
42 43 44 45 46 |
# File 'lib/wayback_archiver/sitemap.rb', line 42 def root_name return unless document.root document.root.name end |
#sitemap_index? ⇒ Boolean
Returns true of Sitemap is a Sitemap index
53 54 55 |
# File 'lib/wayback_archiver/sitemap.rb', line 53 def sitemap_index? root_name == 'sitemapindex' end |
#sitemaps ⇒ Array<String>
Return all sitemap URLs defined in Sitemap.
30 31 32 |
# File 'lib/wayback_archiver/sitemap.rb', line 30 def sitemaps @sitemaps ||= extract_urls('sitemap') end |
#urls ⇒ Array<String>
Return all URLs defined in Sitemap.
21 22 23 |
# File 'lib/wayback_archiver/sitemap.rb', line 21 def urls @urls ||= extract_urls('url') end |
#urlset? ⇒ Boolean
Returns true of Sitemap lists regular URLs
62 63 64 |
# File 'lib/wayback_archiver/sitemap.rb', line 62 def urlset? root_name == 'urlset' end |