Class: WaybackArchiver::Sitemap

Inherits:
Object
  • Object
show all
Defined in:
lib/wayback_archiver/sitemap.rb

Overview

Parse Sitemaps, www.sitemaps.org

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(xml, strict: false) ⇒ Sitemap

Returns a new instance of Sitemap.



8
9
10
11
12
13
14
# File 'lib/wayback_archiver/sitemap.rb', line 8

def initialize(xml, strict: false)
  @document = REXML::Document.new(xml)
rescue REXML::ParseException => _e
  raise if strict

  @document = REXML::Document.new('')
end

Instance Attribute Details

#documentObject (readonly)

Returns the value of attribute document.



6
7
8
# File 'lib/wayback_archiver/sitemap.rb', line 6

def document
  @document
end

Instance Method Details

#plain_document?Boolean

Check if sitemap is a plain file

Returns:

  • (Boolean)

    whether document is plain



36
37
38
# File 'lib/wayback_archiver/sitemap.rb', line 36

def plain_document?
  document.elements.empty?
end

#root_nameString

Return the name of the document (if there is one)

Returns:

  • (String)

    the document root name



42
43
44
45
46
# File 'lib/wayback_archiver/sitemap.rb', line 42

def root_name
  return unless document.root

  document.root.name
end

#sitemap_index?Boolean

Returns true of Sitemap is a Sitemap index

Examples:

Check if Sitemap is a sitemap index

sitemap = Sitemap.new(xml)
sitemap.sitemap_index?

Returns:

  • (Boolean)

    of whether the Sitemap is an Sitemap index or not



53
54
55
# File 'lib/wayback_archiver/sitemap.rb', line 53

def sitemap_index?
  root_name == 'sitemapindex'
end

#sitemapsArray<String>

Return all sitemap URLs defined in Sitemap.

Examples:

Get Sitemap URLs defined in Sitemap

sitemap = Sitemap.new(xml)
sitemap.sitemaps

Returns:

  • (Array<String>)

    of Sitemap URLs defined in Sitemap.



30
31
32
# File 'lib/wayback_archiver/sitemap.rb', line 30

def sitemaps
  @sitemaps ||= extract_urls('sitemap')
end

#urlsArray<String>

Return all URLs defined in Sitemap.

Examples:

Get URLs defined in Sitemap

sitemap = Sitemap.new(xml)
sitemap.urls

Returns:

  • (Array<String>)

    of URLs defined in Sitemap.



21
22
23
# File 'lib/wayback_archiver/sitemap.rb', line 21

def urls
  @urls ||= extract_urls('url')
end

#urlset?Boolean

Returns true of Sitemap lists regular URLs

Examples:

Check if Sitemap is a regular URL list

sitemap = Sitemap.new(xml)
sitemap.urlset?

Returns:

  • (Boolean)

    of whether the Sitemap regular URL list



62
63
64
# File 'lib/wayback_archiver/sitemap.rb', line 62

def urlset?
  root_name == 'urlset'
end