Ultimate Guide to Sitemaps and How They Affect SEO

Ultimate guide to sitemaps and how they affect SEO

‘Create and submit a sitemap’ – you can see this advice almost everywhere from SEO guides to webmaster forums. A sitemap is an essential element of website optimization, which can help a lot if used properly.

Sitemap creation and optimization is not rocket science but has its peculiarities as much as anything else. In this post, you will learn what sitemaps are and how to optimize them to improve website crawling.

1. What Is a Sitemap and Why You Need One

A sitemap is a file that collects all essential pages of your site that you want to be indexed by search engines with a bit of additional data about them like last modified data, change frequency, or alternate language versions.

All publicly available pages on your website will be eventually found by search robots irrespective of whether they are present in your sitemap or not. Though, if your site is big enough, discovering individual deep pages may take more time due to a limited crawl budget.

In layman’s terms, using a sitemap, you provide search robots with direct page addresses to prioritize their crawling and indexing.

Sitemap abd website

From the image above, it’s clear that to find blue houses at the very bottom, search spider needs to pass through several green ones spending its resources. A sitemap allows locating all the houses on a single street.

It’s important to realize that while a sitemap helps search engines crawl your pages more effectively, it does not guarantee they will be indexed eventually. Search robots will evaluate those pages just as any other pages on the web and decide whether they are good enough to be shown in search or not.You can use a tool like IndexCheckr to check if your pages are indexed by Google.

1.2. When Exactly You Need a Sitemap

While a sitemap is undoubtedly a useful tool, it’s worth mentioning the specific cases when it’s essential.

A website should have a sitemap if:

  • It’s new and lacks external incoming links (backlinks from other sites). How should robots discover your pages if they don't have backlinks from elsewhere on the web?
  • It’s huge and/or is regularly updated. Massive websites need to use crawl rate effectively to avoid spending it on low-quality and utility pages. Using a sitemap, you can also tell search engines about pages that got recently updated.
  • It has a complicated internal structure, deep and orphaned pages. Keep in mind that in this case, a sitemap is not a magic pill, you still need to work on proper website architecture and internal linking.
  • It contains rich media content. There are specific sitemaps for news, videos, and images.

According to Google: ‘In most cases, your site will benefit from having a sitemap, and you'll never be penalized for having one.’

2. Types of Sitemaps

In previous paragraphs, I used ‘sitemap’ as a collective term. In reality, there are several types of sitemaps made for different purposes. Here are the most popular ones.

XML Sitemap

XML stands for Extensible Markup Language and is similar to Hypertext Markup Language (HTML). On the contrary to HTML which is made to display data, XML can only store it.

An XML sitemap is the most widespread type which is usually meant when talking about sitemaps in general.

Example:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<url>

<loc>http://www.example.com/</loc>

<lastmod>2019-12-12</lastmod>

<changefreq>monthly</changefreq>

<priority>0.9</priority>

</url>

</urlset>

Alternate Language or Region XML Sitemap

If your website has pages available in several languages or made for specific regions, they should be correctly optimized for search engines. There are multiple ways to specify alternate versions of a page, and one of them is using a sitemap and the ‘hreflang’ attribute.

Example:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:xhtml="http://www.w3.org/1999/xhtml">

<url>

<loc>http://www.example.com/en/page.html</loc>

<xhtml:link

rel="alternate"

hreflang="de"

href="http://www.example.com/de/page.html"/>

<xhtml:link

rel="alternate"

hreflang="en"

href="http://www.example.com/en/page.html"/>

</url>

<url>

<loc>http://www.example.com/de/page.html</loc>

<xhtml:link

rel="alternate"

hreflang="de"

href="http://www.example.com/de/page.html"/>

<xhtml:link

rel="alternate"

hreflang="en"

href="http://www.example.com/en/page.html"/>

</url>

</urlset>

Each page language version must link to other language versions and to itself. Learn more about hreflang.

Image XML Sitemap

Together with schema markup, image sitemap is made to provide Google with additional information about images on your site, optimizing them for image search. It helps search robots find images that are shown using JavaScript and can contain up to 1,000 images per page.

You can either create a separate image sitemap or include images in your regular sitemap. Images located on another domain can also be added if both domains are verified in the Search Console.

Example:

<?xml version="1.0" encoding="UTF-8"?>

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">

<url>

<loc>http://example.com/sample.html</loc>

<image:image>

<image:loc>http://example.com/image.jpg</image:loc>

</image:image>

</url>

</urlset>

You can also specify optional tags like:

  • <image:caption>
  • <image:geo_location>
  • <image:title>
  • <image:license>

Image sitemaps are pretty much unnecessary nowadays unless your website is heavy on images (photo aggregator, stock). If you take a look at the set of metadata you can specify using ImageObject schema, you will see that it’s much more extensive.

Video XML Sitemap

Similar to the image sitemap, a video sitemap provides additional info about video content on your site and can be either separate or embedded in a general sitemap.

Example:

<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">

<url>

<loc>http://www.example.com/videos/page.html</loc>

<video:video>

<video:thumbnail_loc>http://www.example.com/sample.jpg</video:thumbnail_loc>

<video:title>Sitemap optimization guide</video:title>

<video:description>Our senior specialist shares tips on sitemap SEO</video:description>

<video:content_loc>

http://streamserver.example.com/video.mp4</video:content_loc>

</video:video>

</url>

</urlset>

Optional tags can be found here.

You can add multiple videos within one page in your sitemap, but make sure they are relevant to the page. VideoObject schema is what should also be used to describe videos to search engines.

News XML Sitemap

This sitemap type is created specifically for websites included in Google News. It allows getting the latest news published on your website crawled much faster and is essential for news sites and aggregators.

Google recommends adding up to 1,000 URLs per each sitemap, including news published in the last two days.

Example:

<?xml version="1.0" encoding="UTF-8"?>

<urlset   xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"

xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">

<url>

<loc>http://www.example.org/search/news.html</loc>

<news:news>

<news:publication>

<news:name>Breaking News</news:name>

<news:language>en</news:language>

</news:publication>

<news:publication_date>2019-12-12</news:publication_date>

<news:title>Google increased number of ads on the first page to 10</news:title>

</news:news>

</url>

</urlset>

XML Sitemap Index

A sitemap index file is a file collecting multiple sitemaps. Why is it needed? The thing is that each sitemap is limited to 50,000 URLs and 50MB. If your sitemap exceeds the limits, it should be divided into multiple documents collected into a sitemap index file.

Example:

<?xml version="1.0" encoding="UTF-8"?>

<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">

<sitemap>

<loc>http://www.example.com/sitemap1.xml</loc>

<lastmod>2018-12-12T15:22:19+00:00</lastmod>

</sitemap>

<sitemap>

<loc>http://www.example.com/sitemap2.xml</loc>

<lastmod>2018-12-12</lastmod>

</sitemap>

</sitemapindex>

The sitemap index file can contain sitemaps of different domains in case they are all verified in Search Console.

HTML Sitemap

An HTML sitemap is a webpage which contains hyperlinks to all important pages of your site with descriptive anchor text. On the contrary to the XML sitemap which is made for search robots, the purpose of HTML sitemap is to help users navigate through your site.

In an ideal world, your site should have a logical website structure where users can access all pages without a hitch. The reality is lots of sites have complicated structure, deep pages, and no internal search. In such case, an HTML sitemap may help.

PayPal sitemap

An HTML sitemap doesn’t bring much benefit from the SEO point of view and is pretty simple to set up, so let’s focus on the XML sitemap next.

3. XML Sitemap Tags

As you can see from the examples above, there’s a bunch of metadata that can be specified in different kinds of sitemap. Some of the tags are a must, and some are optional. Let’s have a quick look at the main tags and their relevance.

XML Sitemap Tags

Required Tags

Tags highlighted in green are compulsory and serve as the specification of a sitemap version, encoding, protocol, and address of each URL.

XML sitemaps must be UTF-8 encoded, and all the data inside them must be escaped following the standards. <urlset> opens a sitemap and specifies the protocol it uses.

The <url> tag specifies each URL entry. It’s a parent tag that has a bunch of children tags depending on a sitemap type.

<loc> is the primary and obligatory tag for each URL which shows the address of a page. Each tag must contain a full page URL (including http/https and trailing slash in the end if you have one). In other words, if you want a search robot to be able to crawl the page, make sure the address is correct.

Optional Tags

Tags highlighted in red are optional and provide search engines with additional information about each URL.

<lastmod> shows page’s last modified date and time that must be specified according to W3C Datetime format. Google uses this to identify the original author and recrawl pages if they have been recently changed. While John Mueller from Google confirmed they use the lastmod tag, don’t think you can abuse it by updating it each time you made some insignificant changes.

<changefreq> is a tag that describes how frequently the page is getting updated. You can set the following values:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

This tag is not a directive to search robots, so don’t expect robots to recrawl your pages each second if you set ‘always’.

<priority> specifies the priority of a page compared to other pages on a site. The value from 0.0 to 1.0 can be set where 1 is the highest priority.

According to several Google representatives (John Mueller and Gary Illyes), both <changefreq> and <priority> are not used by Google anymore.

4. How to Optimize XML Sitemap

A sitemap is not a ranking factor itself, and your rankings won’t skyrocket as soon as you create it. You won't be penalized for not having one on your site. That being said, a sitemap is still super essential for SEO. Let’s see how to make it work.

Add Only Important and Compliant Pages

There’s no point in stuffing your sitemap with absolutely all pages from the website. Add only relevant pages you want to appear in search. Another key takeaway is to make sure all URLs in a sitemap are compliant. In other words, they should be accessible for search robots.

Make sure there’s no:

  • Broken pages (returning 4xx-5xx status codes)
  • Redirect pages (returning 3xx status codes)
  • Canonicalized pages
  • Duplicate and thin content pages
  • URLs with parameters, internal search pages, print pages, etc. (all these pages should already be hidden from search bots)
  • URLs blocked by robots.txt, robots meta tag, X-Robots-Tag
  • URLs present in another sitemap file
  • Utility pages (terms of use, contact, privacy policy pages, etc.)

Categorize Your Sitemaps

If your site has 500+ pages, they are probably divided into several categories like product pages, blog posts, or whatever you want. There’s no issue in placing all these pages in a single sitemap, but this way you are losing an opportunity to better analyze your sitemaps in the future.

Let me explain what I mean. When you submit a sitemap in Google Search Console (I’ll show how to do it a little bit later) it shows you a report on issues and the number of discovered and valid pages. Having a separate sitemap for product pages, you can track data that is specific to them.

Go to ‘Index’ then ‘Sitemaps’. Here you will see all submitted sitemaps, their status, and the number of discovered pages. Click on the icon next to the number of discovered URLs to see the detailed report on each sitemap, including the number of valid pages (indexed), excluded pages, errors, and warnings.

submitted sitemaps

You can also go to ‘Index’ then ‘Coverage’, and filter the view by the corresponding sitemap.

submitted sitemap

Filter by ‘All submitted pages’ to see the report on all pages you added in your sitemaps.

Use Dynamic Sitemap

An XML sitemap is not something you can create and forget if you are planning to update your site, add, or remove pages. You need to refresh your sitemap after every significant change.

When you have a small site, this is not an issue at all. However, if you operate a relatively big e-commerce project or informational portal, it’s nearly impossible to create a new sitemap every time.

There are a bunch of plugins that dynamically create a sitemap for Wordpress users such as All in One SEO Pack, or Yoast SEO. If you are not a Wordpress user, you can ask a developer to create a custom script for you.

Everything in a Sitemap Will Be Spotted Faster

Remember what I said on avoiding 3xx and 4xx pages in a sitemap? This is still a rule to follow in 99% of cases. Though, if you want search engines to find that a specific page is no more available or redirects to another one, a sitemap can speed up this process.

According to this tweet by Gary Illyes, anything you put in a sitemap will be picked up faster. That’s why you can create a separate sitemap where you will specify fresh redirects, 404 pages, etc.

Moreover, you should avoid having broken or redirecting links on your site as it hurts your UX, so sitemap is the best place to show search robots these changes.

5. How to Create XML Sitemap

Here we come to the most crucial part – sitemap creation. There are several methods you can use to complete this task:

  1. Create a sitemap manually or using custom coded scripts. If you have some technical knowledge and want to rise against the machine, you can code your own sitemap (or ask a developer) which will be the most suitable for you.
  2. Use plugins in your CMS. Plugins like Yoast SEO and Google XML Sitemaps do a pretty good job creating and customizing XML sitemaps.

XML Sitemap

  1. Using web-based generators like XML-Sitemaps. They usually have free limited versions that allow creating sitemaps up to a certain number of pages.
  2. Using a built-in XML sitemap generator in an SEO crawler (e.g. Netpeak Spider).

All options are suitable for small and medium websites. Though, if you want more flexibility, a web crawler and custom script are more appropriate.

Using the crawler, you can get all website pages and break them down into several segments like landing pages, blog posts, images, etc. Then create a sitemap for each segment.

Tip: compress sitemaps into a .gz format using GZip to save some space, but keep in mind that an unzipped file must not exceed the 50MB limit.

Most of the tools will also allow you to validate your sitemap to make sure it complies with the standard.

Add Sitemap to the Root Folder and Robots.Txt

Once a sitemap is successfully created, upload it to the root folder of the site just as you usually do with other files. It has to be like this – yourbeautifulsite.com/sitemap.xml.

Next, go to the robots.txt file and add a line with your sitemap address so that search robots will be able to spot it each time they address the file. If you have multiple sitemap URLs, just add each from a new line.

Root folder and robot

6. Submit Sitemap to Search Engines

The last step is to submit created sitemaps to Google and Bing Webmaster Tools.

Let’s start with Google:

  1. Open Google Search Console.
  2. Go to ‘Index’ then ‘Sitemaps’.
  3. Enter sitemap URL in the address bar and click ‘Submit’

Sitemap submission on search engines

Submit to Bing:

  1. Go to the ‘Sitemaps’ widget on the dashboard.
  2. Click ‘Submit a Sitemap’ and enter your sitemap address.

Ping Sitemap Through an HTTP Request

There’s an alternative and more technical way to notify search engines of your sitemap. It can be done using an HTTP request.

Ping Google: http://www.google.com/ping?sitemap=URL/of/sitemap

Ping Bing: http://www.bing.com/ping?sitemap=URL/of/sitemap

Sitemap Best Practices

Now that you’ve learned sitemaps from soup to nuts, let’s summarize what you should remember about sitemaps:

  • Sitemap improves crawling of your website
  • It does not guarantee your pages will be indexed
  • Keep your sitemap up-to-date
  • Add only compliant pages in a sitemap
  • Don’t forget about the limits
  • Use hreflang for localized page versions
  • Create separate sitemaps for images, videos, news
  • Don’t focus on the <priority> and <changefreq> tags
  • Categorize your sitemaps
  • Create a sitemap index
  • Compress your sitemaps
  • Submit them in Webmaster Tools
  • Add sitemap address in robots.txt
  • Analyze sitemap indexing and fix possible issues

Keep your maps clean and feel free to ask your questions in the comments.

It's a competitive market. Contact us to learn how you can stand out from the crowd.

Read Similar Blogs

Post a Comment

0 Comments

Ready To Rule The First Page of Google?

Contact us for an exclusive 20-minute assessment & strategy discussion. Fill out the form, and we will get back to you right away!

What Our Clients Have To Say

L
Luciano Zeppieri
S
Sharon Tierney
S
Sheena Owen
A
Andrea Bodi - Lab Works
D
Dr. Philip Solomon MD