Generating a sitemap with Ruby on Rails and uploading it to Amazon S3

Sitemap generators allow webmasters to easily generate sitemaps for their websites instead of manually preparing it in a spreadsheet, or by writing a script. There are many ways to generate a sitemap for a website in a secure way. For example, if you have a WordPress site, then many sitemap generating plugins are available.


Here I was working for a Client project based on Ruby on Rails and had to generate a sitemap for my project. Generating a sitemap is beneficial and generating one using Ruby on Rails will be a breeze for developers like us. Here I have made it much simpler and discussed the step by step procedure of generating sitemap and uploading it to Amazon S3. Hope this article helps you when you come across a similar situation.

Before we dive into the process of generating a sitemap. Let’s understand What a sitemap can actually do:

What is a sitemap?

A sitemap is a protocol to get your sites URLs properly indexed on search engine bots for crawling and having a better positioning. It is an XML file that lists all the URLs for a site.  It shows the way the website is organized and how each page is interconnected with the content of the website and how each page is navigated from one hierarchy to the next hierarchy. It allows webmasters to include additional information about each URL like when it was last updated, how often it changes, and how important it is in relation to other URLs on the site. This helps search engines to crawl the site more intelligently. Sitemaps are a URL inclusion protocol and complement robots.txt, a URL exclusion protocol.

Normally it would look like below, if you need more details, please check

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="">







<!-- More URL definitions -->


We have several sitemap schema definitions (shortened here), and after that, we get all the URLs to be mapped and indexed.

While you can build this yourself manually via XML Builder or handcrafting an XML file would be a bit tricky, and even more, setting the last modification dates or modifying tiny bits each time something is added to the site would be unsustainable.

We can automate this process with help of sitemap generator gem

Using the gem

The greatest benefit of using this gem is that it is built to adhere to the Sitemap 0.9 protocol. Not only does it handle regular links, but also supports news, videos, images, mobile and Geo sitemaps. Sitemap Generator also provides Ruby on Rails integration out of the box.

To get started, add the following to your Gemfile:

gem 'sitemap_generator

After running bundle install, run the below rake task to have a default config/sitemap.rb file you can edit

rake sitemap:install

Simple Example

Let’s take a look at a simple example

# Set the host name for URL creation

SitemapGenerator::Sitemap.default_host = ""

# pick a safe place safe to write the files

SitemapGenerator::Sitemap.public_path = tmp/sitemaps/'
SitemapGenerator::Sitemap.create do

 add clients_path, priority: 0.9

 add team_path, priority: 0.8

 add about_path, priority: 1.0

 add contact_path

 add blogs_path, changefreq: 'weekly'
 Blog.find_each do |blog|

    add blog_path(blog.slug), lastmod: blog.updated_at, priority: 0.7,  changefreq:         'never'



There are few things you need to note here

  1. Set default_host to your root website URL. The search engines reading your sitemap need to know what website they are dealing with.
  2. Set public_path to tmp/sitemaps to write our sitemap files before uploading.
  3. Adding URLs, see below for more details

Adding URLs

You call add in the block passed to create to add a path to your sitemap. add takes a string path and optional hash of options generates the URL and adds it to the sitemap.

The blogs_path has the changefreq set to weekly, as we want to indicate the site crawlers and indexers information about how often that index is likely to change. If we were to publish a new blog every day, we could set it to daily.

The about_path, we’ve used the priority parameter and set it to 1.0 as we want it to be considered as the most important page for indexers and crawlers since we want this page to appear first in search results.

The last addition is more interesting, as they relate to indexing dynamic content. On our blog model we are using  slug in the URL, so instead of having we have To get the blogs indexed the correct way, we need to add the URL for each blog searching by the slug.

Additionally, we’ve set the changefreq to never, as once a blog is published, it’s unlikely to be changed.

Generating the sitemaps:

The gem provides a series of tasks to create your sitemap

rake sitemap:create

The above task generate the compressed XML file under the folder specified in the public_path

rake sitemap:refresh

The above task does the same as the previous ones, but it will ping Google and Bing search engines so they know to fetch your newly created sitemap and update their indexed information about the site. You can ping other search engines as well, as stated in the docs.

Finally, you should set a cron job on your server to call rake sitemap:refresh as often as needed.

Uploading the sitemaps to s3

Normally, using the default configurations and working on a VPS should not add difficulties to search engines to fetch your sitemap from your public folder, as the file would be reachable from, following with our example:

However, in the case our application is hosted on Heroku, we face two problems, due to its ephemeral filesystem:

  1. We can’t write on the public folder. That’s why we use the tmp folder on our previous sitemap configuration file.
  2. We can’t guarantee for how long will be in the tmp folder what we save there.

To get around this, what we need is to host our generated sitemap somewhere else, and then allow the search engines to access it. The Sitemap Generator gem offers ways to save the generated file on S3 using fog or carrierwave, so if you already use either of those on your application, you can have a look at this wiki page. However, installing Fog or Carrierwave just for this can be a bit overkill, so here’s a way to do that depending only on the aws-sdk gem.

Once we have the aws-sdk gem installed, we will also need to have an Amazon S3 bucket and the proper credentials set on the corresponding Heroku configuration panel, and/or your local environment, for tests

  • An S3 Access Key Id: ENV[‘S3_ACCESS_KEY_ID’]
  • An S3 Secret Access Key: ENV[‘S3_SECRET_ACCESS_KEY’]
  • The name of the bucket to use: ENV[‘S3_BUCKET’]

Once this is set in settings.yml, we will need a rake task like the following:

namespace 'sitemap' do

 desc 'Upload the sitemap files to S3'

 task :upload_to_s3 => :environment do


  :region =>,


Dir.entries(File.join(Rails.root, "tmp/sitemaps/")).each do |file_name|

  next unless file_name.include?('sitemap.xml.gz')

  file =, "tmp/sitemaps/", file_name))
  s3 =

  object = s3.put_object(:bucket =>,

                         :key => file_name,

                         :body   => file,

                         :acl => 'public-read')
  puts "Saved to S3: #{}/#{file_name}"




Using above task, we’ll write the file to our remote bucket, under a sitemap folder, which should be configured as writable on your AWS panel.

Finally, we will need a rake task that we can program on our cron that takes care of everything: create the sitemap, upload it to S3 and ping the search engines:

Rake::Task["sitemap:create"].enhance do

 if Rails.env.production? && Settings.sitemaps.ping_enabled?

     Rake::Task["sitemap:upload_to_s3"].invoke      SitemapGenerator::Sitemap.ping_search_engines(:sitemap_index_url => "https://#{}")



We are extending default rake task using enhance.Note that on the last invocation, we’re sending the search engines the URL where they can find our sitemap. But the file is not on our server

Configure sitemap in robots.txt

Robots.txt is a standard used by websites to communicate with web crawlers and other web robots. In your public/robots.txt, set Sitemap to the URL of your remote sitemap endpoint:

Sitemap: https://#{}

With the help of scheduler or cron, we can automate the above rake task using below command Schedule sitemap in cron

rake sitemap:refresh


Sitemaps are particularly beneficial on websites where:

  • Some areas of the website are not available through the browser interface
  • Webmasters use rich Ajax, Silverlight, or Flash content that is not normally processed by search engines.
  • The site is very large and there is a chance for the web crawlers to overlook some of the new or recently updated content
  • When websites have a huge number of pages that are isolated or not well linked together

I hope this post is informative and helpful to you. Being a Ruby on Rails expert generating this sitemap just took me a few minutes. Our team at agira technologies have worked on different projects using Ruby on Rails. Follow us to know more about our Ruby on Rails works.