what is sitemap

Last Updated in February 2021 by Lukasz Zelezny

In this blogpost

Hide this section

Guide to perfect Sitemap XML for SEO

As the web is evolving day in and out, so is SEO and Google.

With this, what we considered as the best practices yesterday might not be applicable in the current scenario. This is especially true for sitemaps which are almost as old as SEO.

Although there is a lot of information available online, it is hard to sort out the valuable advice from misinformation.

Most of us think that submitting a sitemap is important, but we often fail to understand the intricacies that are involved in implementing them in a manner that would drive SEO KPIs.

Here is detailed information on the best practices related to sitemaps today:

What is a sitemap?

Consider sitemap as a blueprint for your website which enables visitors to find your website, help the search engines to find, index, and crawl through your website’s content and all determine the priority of the pages on your site.

What makes sitemaps important?

If the pages on your website are linked properly, it would become easy for the web crawlers to discover what your site is actually about.

Although it is not necessary that you specifically need a sitemap, however, it would boost our SEO efforts so it is recommended that you use them when you are building your site.

Also, there are some special cases where you would find using a sitemap useful.

For instance, you can easily find Google web pages through links, and in case you have a completely new site and very few external backlinks, then using a sitemap will certainly boost your presence on Google.

Another scenario is that, if you own an e-commerce site that has more than 5 million pages then unless you have perfect internal linking and external linking, it might be difficult for the search engines to find all the pages of your website. This is where having a sitemap can take you a long way.

What kind of websites requires a sitemap?

If we talk about Google’s documentation, the XML sitemaps are important when you are working on an extremely large website, say, an e-commerce website. In addition, websites that have few external links or have large archives also need a sitemap. These are also helpful if you post content on your website regularly.

It is evident that having these kinds of websites is beneficial to most people, but using XML sitemaps will certainly be effective for every website. All the websites that are available require the search engine to easily find the priority pages in them and know when they were last updated which is why it is necessary to install useful plugins as well.

Classification of XML Sitemaps

Sitemaps are classified into different types. Let’s take a look at the ones that you would require:

What is the Sitemap Index for XML files?

XML sitemaps come with some limitations which are as follows:

  • No more than 50,000 URLs.
  • The limit of uncompressed file size is 50MB

You can compress sitemaps using gzip in order to save the bandwidth for your server. However, when you have unzipped the file, the sitemap still won’t be able to exceed the above-mentioned limit.

When you exceed the above-mentioned limits, you will be required to split the URLs across several XML sitemaps. These sitemaps can be combined with a single XML sitemap index file, which is often referred to as sitemap-index.xml which itself is a sitemap for the various sitemaps.

When you are working on exceptionally large websites where you require a granular approach, you can create several sitemap index files. Here are some of the examples given below:

  • sitemap-index-categories.xml
  • sitemap-index-articles.xml
  • sitemap-index-products.xml

But one thing that you need to be aware of is the fact that you cannot nest the sitemap index files. If you want the search engines to find the sitemap files all at once, you are required to do the following:

  • Submit the sitemap index(es) to Bing Webmaster Tools and Google Search Console
  • Specify index URL(s) of the sitemap in the robots.txt file. This will enable you to point search engines directly to your sitemap as you allow them to crawl.

Another thing that you need to do is to submit the sitemaps by associating them with Google.

Image Sitemap

These help in improving the indexation of the images. In the present-day SEO, the images are embedded in the content of the web pages, so that it is easy to crawl along with the page URL.

In addition, it is easy to use JSON-LD markup in order to call out the image properties to search engines as it helps in providing more attributes than the XML image sitemap.

Due to this reason, the XML image sitemap becomes unnecessary for most websites. By adding an image sitemap, you will only waste the crawl budget.

One of the exceptions related to the same is that the images help in driving the business such as e-commerce or stock photo websites which help in gaining the product page sessions from Google Image Search.

You need to understand that the images that you use don’t need to have the same domains as your website. In addition, you can also use a CDN as verified in the Search Console.

Video Sitemap

In addition to the images, if using videos is crucial to your business, you can submit an XML video sitemap. If that is not the case, then the video sitemap is simply not necessary.

One thing that you need to keep in mind is to save the page crawl budget in which you are going to embed the video and ensure that you use markup for all the videos.

Google News Sitemap

The sites that are registered with Google News happen to use this sitemap. In case you are going to include articles published in the past that have a limit of up to 1,000 URLs per sitemap, then you can update the same with new articles as they get published.

Google News sitemaps do not support image URLs contrary to the online advice. Google recommends that you use og: image or schema.org image to specify the thumbnail for your article for Google News.

Mobile Sitemap

You are not required to use a mobile sitemap for most of the websites. This is because mobile sitemaps are meant for feature phone pages and are not required for most websites. Also, these are not compatible with smartphones.

So, it is advised not to use a mobile sitemap unless you have unique URLs designed for featured phones.

HTML Sitemap

HTML sitemaps are basically designed to assist people in order to find content.

One question that often haunts the user is whether they need a HTML sitemap for ensuring a better user experience and have well-crafted internal links.

You need to check the number of page views on the HTML sitemap using Google Analytics. If they are low, it means that you can improve the navigation of your website.

These sitemaps are linked with the help of website footers and take link equity from each page on your website.

It entirely depends on you whether you want to utilise link equity or do you want to include the HTML sitemap.

Dynamic Sitemap

Static sitemaps can be easily created using online tools like Screaming Frog.

However, one major issue with these sitemaps is that as you remove or create a page, the sitemap becomes outdated. Even if you try modifying the content, the sitemap will fail to update the lastmod tag automatically.

So, it is best to avoid using static sitemaps if you cannot manually create and upload sitemaps for every single change.

You can easily update the dynamic XML sitemaps automatically by using your server and reflecting the relevant changes in the website as they occur.

To create a dynamic XML sitemap:

  • The developer will need to code a custom script and provide clear specifications for the same
  • Dynamic sitemap generator tool will be required
  • Install a plugin for the CMS you are using

Dynamic XML sitemaps and sitemap index are some of the best practices that are used in the present times, and not HTML or mobile sitemaps.

It is recommended that you use video, image, and Google News sitemaps only if improved indexations of the content types help in driving your KPIs.

Indexation Optimization of the XML sitemaps

If you want to drive SEO KPIs, then you should include pages that have relevant SEO in XML sitemaps.

An XML sitemap contains the list of pages that need to be crawled and not all the pages of your website.

After this, a search spider will appear on your website and will indicate the number of pages it will crawl. Then the XML sitemap indicates the things that need to be considered more important than the ones that aren’t blocked in the sitemap.

By doing so, you are indicating to the search engines that you would rather focus on certain URLs in particular. With this, your crawl budget will also become more effective.

Just by using the SEO relevant pages, you will enable the search engines to crawl the site more effectively and reap the benefits of using better indexation.

Hire an SEO Consultant

Hire a #1 SEO Consultant living in London, who was working with companies like Zoopla, uSwitch, Mashable, Thomson Reuters and many others. Hire Lukasz Zelezny (MCIM, F IDM).

    Measures You Need to Take to Perfect SiteMap XML for SEO

    Use Plugins & Tools that Generates Sitemap Automatically

    It is easy to generate a sitemap when you have the right set of tools like auditing software that comes with a built-in XML Sitemap generator or highly used plugins.

    Even the websites made in WordPress that use Yoast SEO can enable XML Sitemaps in the plugin.

    Another way to set up a sitemap is to do it by following the code structure of the XML sitemap.

    Technically, it is not necessary for your sitemap to be in XML format – you will just need a text file that has a new line that separates each URL.

    If you are planning to implement the hreflang attribute, you will be required to generate an entire XML sitemap, so that it becomes easier to get the work going.

    You can also check out Google or Bing pages for more information on the ways to set up the sitemap automatically.

    Submit the Sitemap to Google

    Go to the Google Search Console dashboard, and click Crawl > Sitemaps > Add Test Sitemap to submit your sitemap to Google.

    You can test the sitemap and view the results to look out for errors and can prevent key landing pages from getting indexed before Submitting Sitemap

    You need to make sure that the number of pages indexed is the same as the number of pages submitted.

    By submitting your sitemap, you are telling Google to consider high-quality pages that need indexation, but it won’t guarantee that those pages will be indexed.

    By submitting your sitemap, you will be able to

    • Help Google understand the manner in which the website is laid out.
    • Discover errors that can be corrected to ensure that your pages are indexed properly.
    Give Priority to High-Quality Pages in Your Sitemap

    Overall site quality is a key factor, wherever page ranking is concerned.

    If your sitemap is directing bots to low-quality pages, search engines might interpret that people will want to visit these website pages people – even if the pages are important for your website, such as the login pages.

    Instead, you should rather direct bots to the high priority pages on your site. These pages need to:

    • Be highly optimised.
    • Contain images & video.
    • Contain unique content.
    • Enable user engagement through reviews and comments.

    Google Search Console can create problems if it does not index the web pages and indicates that the pages can become problematic.

    For instance, if you are submitting 20,000 pages and using only 15,000 for indexing, you will not get any notification that there are 5,000 “problem pages.”

    This is a common problem with most e-commerce websites that contain multiple pages for products that are similar.

    It is recommended that you split the product-specific pages into multiple XML sitemaps and test each one of them.

    It is recommended that you create sitemaps affirming hypotheses, such as “pages without product images are not getting indexed” or “pages that don’t have unique copy are not getting indexed.”

    When you have isolated the major problems, you can either fix them or set the pages to “noindex,” so that they do not affect the overall quality of your website.

    Include the Canonical Versions of URLs in the Sitemap

    If your website has several pages that are almost similar to the product pages, it is recommended that you should use the “link rel=canonical” tag to help Google understand which is the “main” page it should index and crawl.

    It is easier for Bots to discover key pages if you don’t include pages containing canonical URLs that point at other pages.

    Avoid using Robots.txt Whenever Possible

    If you don’t want to use an indexed page, you usually can use the meta robots “noindex, follow” tag.

    Using this tag, you can prevent Google from indexing the page but allow it to preserve your link equity. This is especially useful for utility pages that are of high priority to your site.

    Use robots.txt to block pages only when you are going low on your crawl budget.

    If you see that Google is indexing and re-crawling low-priority pages at the expense of core pages, it is better to use robots.txt.

    Create Dynamic Sitemaps for Larger Websites

    It can become hard to keep up with all the meta robots on large websites.

    Instead, you can set up rules to determine when a page will be included in your XML sitemap or when it will be changed from noindex to “index, follow.”

    Detailed instructions are available on the ways to develop a dynamic XML-based sitemap but, you can skip this step and use a tool that is responsible for generating dynamic sitemaps to get things moving quickly

    Use RSS/Atom Feeds & XML Sitemaps

    RSS/Atom feeds give notifications to the search engines when you want to update a page or add new content to your website.

    It is recommended by Google to use both RSS/Atom feeds and sitemaps to allow search engines to determine which pages need to be updated and indexed.

    By adding the recently updated content in your RSS/Atom feeds you will make it easier for the visitors to find new content for both search engines easily.

    Modify the pages for substantial changes

    It is better to avoid tricking search engines into re-indexing pages by updating the modification time of the pages without making any major changes.

    Avoid using ‘noindex’ URLs in Your Sitemap

    Considering the wasted crawl budget, in case the engine robots are not allowed to index some pages, then there is no use to implement them in your sitemap.

    If your sitemap includes “noindex” and blocked pages, it means that you’re telling Google that it is important to index that particular page and you’re not allowed to index this page.

    Lack of consistency is a major problem in this regard.

    Priority Settings should not be given too much importance

    The “Priority” column is available for some sitemaps that indicate which pages are on high priority.

    However, several people are still skeptical as to whether this feature actually works or not.

    Use a smaller sitemap

    A smaller sitemap means that you are putting less strain on your server.

    Recently both Google and Bing increased the size of acceptable sitemap files from 50,000 URLs per sitemap and from 10 MB to 50 MB.

    Although these sizes are sufficient for most of the sites, some of the webmasters will be required to split the pages into two or more sitemaps.

    For instance, if you are running an online store that has 200,000 pages – you will be required to create five separate sitemaps to manage everything.

    Don’t Create a Sitemap if that is not required

    It is not necessary that every website requires a sitemap.

    It is easy for Google to find and index the pages quite accurately. Moreover, by using a sitemap alone, you won’t be able to bring great SEO value.

    In case you are building a one-pager or portfolio website, or even an organization website where you do not require to update it regularly, then you can avoid using a sitemap.

    But in case, if you publish fresh content regularly and want it indexed as soon as possible, or your website has hundreds of thousands of pages such as an e-commerce website, then you can use the sitemap to provide information directly to Google.

    XML Sitemap Best Practice Checklist

    Make sure that you invest your time in practising the following:

    • Use hreflang tags in XML sitemaps
    • Use <lastmod> and <loc> tags
    • Use gzip to compress sitemap files
    • Use a sitemap index file
    • Use video, image, and Google news sitemaps if you can ensure indexation drives your KPIs
    • Generate video XML sitemaps dynamically
    • Ensure URLs are included only in a single sitemap
    • Reference sitemap index URLs in robots.txt
    • Submit sitemap index to both Google Search Console and Bing Webmaster Tools
    • Include only SEO relevant pages in XML sitemaps
    • Work on all the errors & warnings
    • Analyze the latest trends and different types of valid pages
    • Calculate the number of submitted pages & indexation rates
    • Address causes of exclusion for submitted pages

    With these points in mind, you can now check your sitemap and ensure that you’re on the right track.

    To sum it up, by including these useful techniques, you will surely be able to make your website more searchable and accessible to the user. With this, you will be able to improve your ranking on the search engines along with user engagement in the long run.