SitemapGenerator
SitemapGenerator generates Sitemaps for your Rails application. The Sitemaps adhere to the Sitemap 0.9 protocol specification. You specify the contents of your Sitemap using a configuration file, à la Rails Routes. A set of rake tasks is included to help you manage your Sitemaps.
Features
- Supports Video sitemaps and Image sitemaps
- Rails3 compatible (beta)
- Adheres to the Sitemap 0.9 protocol
- Handles millions of links
- Compresses Sitemaps using GZip
- Notifies Search Engines (Google, Yahoo, Bing, Ask, SitemapWriter) of new sitemaps
- Ensures your old Sitemaps stay in place if the new Sitemap fails to generate
- You set the hostname (and protocol) of the links in your Sitemap
Changelog
- v1.1.0: Video sitemap support
- v0.2.6: Image Sitemap support
- v0.2.5: Rails 3 support (beta)
Foreword
Adam Salter first created SitemapGenerator while we were working together in Sydney, Australia. Unfortunately, he passed away in 2009. Since then I have taken over development of SitemapGenerator.
Those who knew him know what an amazing guy he was, and what an excellent Rails programmer he was. His passing is a great loss to the Rails community.
The canonical repository is now: http://github.com/kjvarga/sitemap_generator
Install
Rails 3:
Add the gem to your Gemspec
gem 'sitemap_generator'
$ rake sitemap:install
Rails 2.x: As a gem
Add the gem as a dependency in your config/environment.rb
config.gem 'sitemap_generator', :lib => false
$ rake gems:install
Add the following to your RAILS_ROOT/Rakefile
begin require 'sitemap_generator/tasks' rescue Exception => e puts "Warning, couldn't load gem tasks: #{e.}! Skipping..." end
$ rake sitemap:install
Rails 2.x: As a plugin
$ ./script/plugin install git://github.com/kjvarga/sitemap_generator.git
Usage
rake sitemap:install
creates a config/sitemap.rb file which will contain your logic for generating the Sitemap files.
Once you have configured your sitemap in config/sitemap.rb run rake sitemap:refresh
as needed to create/rebuild your Sitemap files. Sitemaps are generated into the public/ folder and are named sitemap_index.xml.gz, sitemap1.xml.gz, sitemap2.xml.gz, etc.
Using rake sitemap:refresh
will notify major search engines to let them know that a new Sitemap is available (Google, Yahoo, Bing, Ask, SitemapWriter). To generate new Sitemaps without notifying search engines (for example when running in a local environment) use rake sitemap:refresh:no_ping
.
To ping Yahoo you will need to set your Yahoo AppID in config/sitemap.rb. For example: SitemapGenerator::Sitemap.yahoo_app_id = "my_app_id"
To disable all non-essential output (only errors will be displayed) run the rake tasks with the -s
option. For example rake -s sitemap:refresh
.
Cron
To keep your Sitemaps up-to-date, setup a cron job. Make sure to pass the -s
option to silence rake. That way you will only get email when the sitemap build fails.
If you're using Whenever, your schedule would look something like the following:
# config/schedule.rb
every 1.day, :at => '5:00 am' do
rake "-s sitemap:refresh"
end
Robots.txt
You should add the Sitemap index file to public/robots.txt
to help search engines find your Sitemaps. The URL should be the complete URL to the Sitemap index file. For example:
Sitemap: http://www.example.org/sitemap_index.xml.gz
Image and Video Sitemaps
Images can be added to a sitemap URL by passing an :images array to add(). Each item in the array must be a Hash containing tags defined by the Image Sitemap specification. For example:
sitemap.add('/index.html', :images => [{ :loc => 'http://www.example.com/image.png', :title => 'Image' }])
A video can be added to a sitemap URL by passing a :video Hash to add(). The Hash can contain tags defined by the Video Sitemap specification. To associate more than one tag with a video, pass the tags as an array with the key :tags.
sitemap.add('/index.html', :video => { :thumbnail_loc => 'http://www.example.com/video1_thumbnail.png', :title => 'Title', :description => 'Description', :content_loc => 'http://www.example.com/cool_video.mpg', :tags => %w[one two three], :category => 'Category' })
Example config/sitemap.rb
# Set the host name for URL creation
SitemapGenerator::Sitemap.default_host = "http://www.example.com"
SitemapGenerator::Sitemap.yahoo_app_id = nil # Set to your Yahoo AppID to ping Yahoo
SitemapGenerator::Sitemap.add_links do |sitemap|
# Put links creation logic here.
#
# The Root Path ('/') and Sitemap Index file are added automatically.
# Links are added to the Sitemap output in the order they are specified.
#
# Usage: sitemap.add path, options
# (default options are used if you don't specify them)
#
# Defaults: :priority => 0.5, :changefreq => 'weekly',
# :lastmod => Time.now, :host => default_host
# add '/articles'
sitemap.add articles_path, :priority => 0.7, :changefreq => 'daily'
# add all articles
Article.all.each do |a|
sitemap.add article_path(a), :lastmod => a.updated_at
end
# add news page with images
News.all.each do |news|
images = news.images.collect do |image|
{ :loc => image.url, :title => image.name }
end
sitemap.add news_path(news), :images => images
end
end
# Including Sitemaps from Rails Engines.
#
# These Sitemaps should be almost identical to a regular Sitemap file except
# they needn't define their own SitemapGenerator::Sitemap.default_host since
# they will undoubtedly share the host name of the application they belong to.
#
# As an example, say we have a Rails Engine in vendor/plugins/cadability_client
# We can include its Sitemap here as follows:
#
file = File.join(Rails.root, 'vendor/plugins/cadability_client/config/sitemap.rb')
eval(open(file).read, binding, file)
Raison d'être
Most of the Sitemap plugins out there seem to try to recreate the Sitemap links by iterating the Rails routes. In some cases this is possible, but for a great deal of cases it isn't.
a) There are probably quite a few routes in your routes file that don't need inclusion in the Sitemap. (AJAX routes I'm looking at you.)
and
b) How would you infer the correct series of links for the following route?
map.zipcode 'location/:state/:city/:zipcode', :controller => 'zipcode', :action => 'index'
Don't tell me it's trivial, because it isn't. It just looks trivial.
So my idea is to have another file similar to 'routes.rb' called 'sitemap.rb', where you can define what goes into the Sitemap.
Here's my solution:
Zipcode.find(:all, :include => :city).each do |z|
sitemap.add zipcode_path(:state => z.city.state, :city => z.city, :zipcode => z)
end
Easy hey?
Other Sitemap settings for the link, like lastmod
, priority
, changefreq
and host
are entered automatically, although you can override them if you need to.
Compatibility
Tested and working on:
- Rails 3.0.0
- Rails 1.x - 2.3.8
- Ruby 1.8.6, 1.8.7, 1.8.7 Enterprise Edition, 1.9.1
Notes
1) For large sitemaps it may be useful to split your generation into batches to avoid running out of memory. E.g.:
# add movies
Movie.find_in_batches(:batch_size => 1000) do |movies|
movies.each do |movie|
sitemap.add "/movies/show/#{movie.to_param}", :lastmod => movie.updated_at, :changefreq => 'weekly'
end
end
2) New Capistrano deploys will remove your Sitemap files, unless you run rake sitemap:refresh
. The way around this is to create a cap task:
after "deploy:update_code", "deploy:copy_old_sitemap"
namespace :deploy do
task :copy_old_sitemap do
run "if [ -e #{previous_release}/public/sitemap_index.xml.gz ]; then cp #{previous_release}/public/sitemap* #{current_release}/public/; fi"
end
end
Known Bugs
- There's no check on the size of a URL which isn't supposed to exceed 2,048 bytes.
- Currently only supports one Sitemap Index file, which can contain 50,000 Sitemap files which can each contain 50,000 urls, so it only supports up to 2,500,000,000 (2.5 billion) urls. I personally have no need of support for more urls, but plugin could be improved to support this.
Wishlist & Coming Soon
- Ultimately I'd like to make this gem framework agnostic. It is better suited to being run as a command-line tool as opposed to Ruby-specific Rake tasks.
- Add rake tasks/options to validate the generated sitemaps.
- Support News, Mobile, Geo and other types of sitemaps
- Support for generating sitemaps for sites with multiple domains. Sitemaps can be generated into subdirectories and we can use Rack middleware to rewrite requests for sitemaps to the correct subdirectory based on the request host.
- Auto coverage testing. Generate a report of broken URLs by checking the status codes of each page in the sitemap.
Thanks (in no particular order)
- Alex Soto for video sitemaps
- Alexadre Bini for image sitemaps
- Dan Pickett
- Rob Biedenharn
- Richie Vos
- Adrian Mugnolo
- Jason Weathered
- Andy Stewart
Copyright (c) 2009 Karl Varga released under the MIT license