Class: WaybackArchiver::Sitemap

Inherits:
Object
  • Object
show all
Defined in:
lib/wayback_archiver/sitemap.rb

Overview

Parse Sitemaps, www.sitemaps.org

Instance Attribute Summary collapse

Instance Method Summary collapse

Constructor Details

#initialize(xml_or_string, strict: false) ⇒ Sitemap

Returns a new instance of Sitemap.



9
10
11
12
13
14
15
16
# File 'lib/wayback_archiver/sitemap.rb', line 9

def initialize(xml_or_string, strict: false)
  @contents = xml_or_string
  @document = REXML::Document.new(xml_or_string)
rescue REXML::ParseException => _e
  raise if strict

  @document = REXML::Document.new('')
end

Instance Attribute Details

#documentObject (readonly)

Returns the value of attribute document.



7
8
9
# File 'lib/wayback_archiver/sitemap.rb', line 7

def document
  @document
end

Instance Method Details

#plain_document?Boolean

Check if sitemap is a plain file

Returns:

  • (Boolean)

    whether document is plain



38
39
40
# File 'lib/wayback_archiver/sitemap.rb', line 38

def plain_document?
  document.elements.empty?
end

#root_nameString

Return the name of the document (if there is one)

Returns:

  • (String)

    the document root name



44
45
46
47
48
# File 'lib/wayback_archiver/sitemap.rb', line 44

def root_name
  return unless document.root

  document.root.name
end

#sitemap_index?Boolean

Returns true of Sitemap is a Sitemap index

Examples:

Check if Sitemap is a sitemap index

sitemap = Sitemap.new(xml)
sitemap.sitemap_index?

Returns:

  • (Boolean)

    of whether the Sitemap is an Sitemap index or not



55
56
57
# File 'lib/wayback_archiver/sitemap.rb', line 55

def sitemap_index?
  root_name == 'sitemapindex'
end

#sitemapsArray<String>

Return all sitemap URLs defined in Sitemap.

Examples:

Get Sitemap URLs defined in Sitemap

sitemap = Sitemap.new(xml)
sitemap.sitemaps

Returns:

  • (Array<String>)

    of Sitemap URLs defined in Sitemap.



32
33
34
# File 'lib/wayback_archiver/sitemap.rb', line 32

def sitemaps
  @sitemaps ||= extract_urls('sitemap')
end

#urlsArray<String>

Return all URLs defined in Sitemap.

Examples:

Get URLs defined in Sitemap

sitemap = Sitemap.new(xml)
sitemap.urls

Returns:

  • (Array<String>)

    of URLs defined in Sitemap.



23
24
25
# File 'lib/wayback_archiver/sitemap.rb', line 23

def urls
  @urls ||= extract_urls('url')
end

#urlset?Boolean

Returns true of Sitemap lists regular URLs

Examples:

Check if Sitemap is a regular URL list

sitemap = Sitemap.new(xml)
sitemap.urlset?

Returns:

  • (Boolean)

    of whether the Sitemap regular URL list



64
65
66
# File 'lib/wayback_archiver/sitemap.rb', line 64

def urlset?
  root_name == 'urlset'
end