Class: UrlFinder::SitemapReader

Inherits:
BaseReader show all
Defined in:
lib/url_finder/readers/sitemap_reader.rb

Overview

Parse Sitemaps, www.sitemaps.org

Instance Attribute Summary

Attributes inherited from BaseReader

#content

Instance Method Summary collapse

Methods inherited from BaseReader

#each, #empty?, #initialize, #to_a, urls

Constructor Details

This class inherits a constructor from UrlFinder::BaseReader

Instance Method Details

#documentREXML::Document

The XML document

Returns:

  • (REXML::Document)

    the XML document



17
18
19
20
21
22
23
# File 'lib/url_finder/readers/sitemap_reader.rb', line 17

def document
  @document ||= begin
    REXML::Document.new(content)
  rescue REXML::ParseException => _e
    REXML::Document.new('')
  end
end

#plain_document?Boolean

Check if sitemap is a plain file

Returns:

  • (Boolean)

    whether document is plain



36
37
38
# File 'lib/url_finder/readers/sitemap_reader.rb', line 36

def plain_document?
  document.elements.empty?
end

#root_nameString

Return the name of the document (if there is one)

Returns:

  • (String)

    the document root name



42
43
44
45
46
# File 'lib/url_finder/readers/sitemap_reader.rb', line 42

def root_name
  return unless document.root

  document.root.name
end

#sitemap_index?Boolean

Returns true of Sitemap is a Sitemap index

Examples:

Check if Sitemap is a sitemap index

sitemap = Sitemap.new(xml)
sitemap.sitemap_index?

Returns:

  • (Boolean)

    of whether the Sitemap is an Sitemap index or not



53
54
55
# File 'lib/url_finder/readers/sitemap_reader.rb', line 53

def sitemap_index?
  root_name == 'sitemapindex'
end

#sitemapsArray<String>

Return all sitemap URLs defined in Sitemap.

Examples:

Get Sitemap URLs defined in Sitemap

sitemap = Sitemap.new(xml)
sitemap.sitemaps

Returns:

  • (Array<String>)

    of Sitemap URLs defined in Sitemap.



30
31
32
# File 'lib/url_finder/readers/sitemap_reader.rb', line 30

def sitemaps
  @sitemaps ||= extract_urls('sitemap')
end

#urlsArray<String>

Return all URLs defined in Sitemap.

Examples:

Get URLs defined in Sitemap

sitemap = Sitemap.new(xml)
sitemap.urls

Returns:

  • (Array<String>)

    of URLs defined in Sitemap.



11
12
13
# File 'lib/url_finder/readers/sitemap_reader.rb', line 11

def urls
  @urls ||= extract_urls('url')
end

#urlset?Boolean

Returns true of Sitemap lists regular URLs

Examples:

Check if Sitemap is a regular URL list

sitemap = Sitemap.new(xml)
sitemap.urlset?

Returns:

  • (Boolean)

    of whether the Sitemap regular URL list



62
63
64
# File 'lib/url_finder/readers/sitemap_reader.rb', line 62

def urlset?
  root_name == 'urlset'
end