About Sitemaps

Notes on sitemaps and how to best handle them for single page applications.

Date Created:
Last Edited:

References



Notes


Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL so that search engines can more intelligently crawl the site.

Web crawlers usually discover pages from links within the site and from other sites. Sitemaps implement this data to allow crawlers that support Sitemaps to pick up URLs in the Sitemap and learn about those URLs using the associated metadata. Using the sitemap protocol does not guarantee tat web pages are included in search engines, but provides hints for web crawlers to do a better job or crawling your site.


Sitemaps XML format


  • The Sitemap protocol format consists of XML tags.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <url>
      <loc>http://www.example.com/</loc>
      <lastmod>2005-01-01</lastmod>
      <changefreq>monthly</changefreq>
      <priority>0.8</priority>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=12&amp;desc=vacation_hawaii</loc>
      <changefreq>weekly</changefreq>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=73&amp;desc=vacation_new_zealand</loc>
      <lastmod>2004-12-23</lastmod>
      <changefreq>weekly</changefreq>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=74&amp;desc=vacation_newfoundland</loc>
      <lastmod>2004-12-23T18:00:15+00:00</lastmod>
      <priority>0.3</priority>
   </url>
   <url>
      <loc>http://www.example.com/catalog?item=83&amp;desc=vacation_usa</loc>
      <lastmod>2004-11-23</lastmod>
   </url>
</urlset>
  • All tags in a Sitemap must be entity-escaped, and the file itself must be UTF-8 encoded.

Character

Escape Code

Ampersand

&

&amp;

Single Quote

'

&apos;

Double Quote

"

&quot;

Greater Than

>

&gt;

Less Than

<

&lt;

  • The Sitemap must:
    • Begin with an opening <urlset> tag and end with a closing </urlset> tag
    • Specify the namespace (protocol standard( within the <urlset> tag
    • Include a <url> entry for each URL, as a parent XML tag
    • Include a <loc> child entry for each <url> parent tag
  • All other tags are optional, and support for these other tags may vary along search engines.

Attribute

Description

<urlset>

required

Encapsulates the file and references the current protocol standard

<url>

required

Parent tag for each URL entry. The remaining tags are children of this tag.

<loc>

optional

URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2,048 characters.

Note that URLs must be escaped - best to use something like encodeURI() or equivalent.

<lastmod>

optional

The date of last modification of the page. This date should be in the W3C format (see below). This format allows you to omit the time portion if desired.

<changefreq>

optional

How frequently this page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values:

  • always
  • hourly
  • daily
  • weekly
  • monthly
  • yearly
  • never

<priority>

optional

The priority of thus URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. The default priority is 0.5.

W3C Datetime Format


  • Year:
    • YYYY (e.g. 1997)
  • Year and month:
    • YYYY-MM (e.g. 1997-07)
  • Complete date:
    • YYYY-MM-DD (e.g. 1997-07-16)
  • Complete date plus hours and minutes:
    • YYYY-MM-DDThh:mmTZD (e.g. 1997-07-16T19:20+01:00)
  • Complete date plus hours, minutes and seconds:
    • YYYY-MM-DDThh:mm:ssTZD (e.g. 1997-07-16T19:20:30+01:00)
  • Complete date plus hours, minutes, seconds and a decimal fraction of a second:
    • YYYY-MM-DDThh:mm:ss.sTZD (e.g. 1997-07-16T19:20:30.45+01:00)

where:

  • YYYY = four-digit year
  • MM = two-digit month (01=January, etc.)
  • DD = two-digit day of month (01 through 31)
  • hh = two digits of hour (00 through 23) (am/pm NOT allowed)
  • mm = two digits of minute (00 through 59)
  • ss = two digits of second (00 through 59)
  • s = one or more digits representing a decimal fraction of a second
  • TZD = time zone designator (Z or +hh:mm or -hh:mm)


Using Sitemap Index Files


You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files. Each of these Sitemap files should be listed in a sitemap index file. Sitemap index files may not list more than 50,000 sitemaps and must be no larger than 50MB.

  • The Sitemap index file must:
    • Begin with an opening <sitemapindex> tag and end with a closing </sitemapindex> tag
    • Include a <sitemap> entry for each Sitemap as a parent XML tag
    • Include a <loc> child entry for each <sitemap> parent tag
  • The optional <lastmod> tag is also available for Sitemap index files.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
   <sitemap>
      <loc>http://www.example.com/sitemap1.xml.gz</loc>
      <lastmod>2004-10-01T18:23:17+00:00</lastmod>
   </sitemap>
   <sitemap>
      <loc>http://www.example.com/sitemap2.xml.gz</loc>
      <lastmod>2005-01-01</lastmod>
   </sitemap>
</sitemapindex>

Attribute

Description

<sitemapindex>

required

Encapsulates information about all of the Sitemaps in the file

<sitemap>

required

Encapsulates information about an individual Sitemap.

<loc>

required

Identifies the location of the Sitemap. This location can be a Sitemap, an Atom file, RSS file, or a simple text file.

<lastmod>

optional

Identifies the time that the corresponding Sitemap file was modified. It Does not correspond to the time that the pages listed in the Sitemap were changed. This should be in W3C Datetime format.

The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.
[...]
It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.xml.

Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location. You can do this by:

  • submitting it to them via the search engine's submission interface
  • specifying the location in your site's robots.txt file
    • Add a line like this to the file:
      • Sitemap: http://www.example.com/sitemap.xml
  • sending an HTTP request
    • Send a request using the following scheme (replace <searchengine_URL> with the URL provided by the search engine):
      • <searchengine_URL>/ping?sitemap=sitemap_url
      • Example: https://google.com/ping?https://frankmbrown.net?sitemap=https://frankmbrown.net/sitemap.xml
    • A successful request will return an HTTP 200 status code.


What Google Says About Sitemaps


A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to crawl your site more efficiently. A sitemap tells search engines which pages and files you think are important in your site, and also provides valuable information about these files.

You can use a sitemap to provide information about specific types of content on your pages, including video, image, and news content.

Image Sitemaps


Image sitemaps are a way of telling Google about other images on your site, especially those that we might not otherwise find (such as images your site reaches with JavaScript code). You can create a separate image sitemap or add sitemap tags to your existing sitemap; either approach is equally fine for Google.

Required Tags

<image:image>

Encloses all information about a single image. Each <url> tag can contain up to 1,000 <image:image> tags.

<image:loc>

The URL of the image.
In some cases, the image URL may not be on the same domain as your main site. This is fine, as long as you verify both domains in Search Console. If, for example, you use a content delivery network such as Google Sites to host your images, make sure that the hosting site is verifies in Search Console. In addition, make sure that your robots.txt file doesn't disallow the crawling of any content you want indexed.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"  xmlns:image="http://www.google.com/schemas/sitemap-image/1.1"> 
<url>   
<loc>https://example.com/sample1.html</loc>   
<image:image>     
<image:loc>https://example.com/image.jpg</image:loc>   
</image:image>   
<image:image>     
<image:loc>https://example.com/photo.jpg</image:loc>   
</image:image> 
</url> 
<url>   
<loc>https://example.com/sample2.html</loc>   
<image:image>     
<image:loc>https://example.com/picture.jpg</image:loc>   
</image:image> 
</url>
</urlset>


News Sitemaps


If you are a news publisher, use news sitemaps to tell Google about your news articles and additional information about them. You can either extend your existing sitemap with news specific tags, or create a separate news sitemap that's reserved just for your news articles.
News Sitemap Best Practices
  • Generic sitemap best practices apply to news sitemaps
  • Update your news sitemap with fresh articles as they're published. Don't create a new sitemap with each update. Google News crawls news sitemaps as often as it crawls the rest of your site.
  • Only include recent URLs for articles that were created in the last two days. Once the articles are older than two days, either remove those URLs from the news sitemap or remove the <news:news> metadata in your sitemap from the older URLs.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
  <url>
  <loc>http://www.example.org/business/article55.html</loc>
  <news:news>
    <news:publication>
      <news:name>The Example Times</news:name>
      <news:language>en</news:language>
    </news:publication>
    <news:publication_date>2008-12-23</news:publication_date>
    <news:title>Companies A, B in Merger Talks</news:title>
  </news:news>
  </url>
</urlset>

Required Tags

<news:news>

The parent tag of other tags in the news: namespace. Each <url> sitemap tag can only have one <news:news> tag. A sitemap may have up to 1,000 <news:news> tags. If there are more than 1,000 <news:news> tags, split your sitemap up.

<news:publication>

The parent tag for the name and language tag. Each <news:news> parent tag may have only one <news:publication> tag.

<news:name>

It is the name of the news publication. It must exactly match the name as it appears on your articles on news.google.com, omitting anything in parentheses.

<news:language>

Language of your publication. use an ISO 639 language code.

<news:publication_date>

The article publication date in W3C format. The article publication date in W3C format. Use either the "complete date" format (YYYY-MM-DD) or the "complete date plus hours, minutes, and seconds" format with time zone designator format (YYYY-MM-DDThh:mm:ssTZD). Specify the original date and time when the article was first published on your site. Don't specify the time when you added the article to your sitemap.

<news:title>

The title of the news article


Video Sitemaps and Alternatives


A video sitemap is a sitemap with additional information about videos hosted on your pages. Creating a video sitemap is a good way to help Google find and understand the video content on your site, especially content that was recently added or that we might not otherwise discover with our usual crawling mechanisms.
Video sitemap best practices
  • Video sitemaps are based on generic sitemaps
  • Don't list videos that are unrelated to the content of the host page
  • All files in the video sitemap must be accessible to Googlebot. This means that all URLs in the video sitemap:
    • must not be disallowed for crawling by robots.txt rules
    • must be accessible without metafiles and without logging in
    • must not be blocked by firewalls or similar mechanism
    • and must be accessible on a supported protocol: HTTP and FTP (streaming protocols are not supported)

If you want to prevent spammers from accessing your video content at the <player_loc> or <content_loc> URLs, verify that any bots accessing your server are really Googlebot.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
    xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
  <url>
    <loc>https://www.example.com/videos/some_video_landing_page.html</loc>
    <video:video>
      <video:thumbnail_loc>https://www.example.com/thumbs/123.jpg</video:thumbnail_loc>
      <video:title>Grilling steaks for summer</video:title>
      <video:description>
        Alkis shows you how to get perfectly done steaks every time
      </video:description>
      <video:content_loc>
        http://streamserver.example.com/video123.mp4
      </video:content_loc>
      <video:player_loc>
        https://www.example.com/videoplayer.php?video=123
      </video:player_loc>
      <video:duration>600</video:duration>
      <video:expiration_date>2021-11-05T19:20:30+08:00</video:expiration_date>
      <video:rating>4.2</video:rating>
      <video:view_count>12345</video:view_count>
      <video:publication_date>2007-11-05T19:20:30+08:00</video:publication_date>
      <video:family_friendly>yes</video:family_friendly>
      <video:restriction relationship="allow">IE GB US CA</video:restriction>
      <video:price currency="EUR">1.99</video:price>
      <video:requires_subscription>yes</video:requires_subscription>
      <video:uploader
        info="https://www.example.com/users/grillymcgrillerson">GrillyMcGrillerson
      </video:uploader>
      <video:live>no</video:live>
    </video:video>
    <video:video>
      <video:thumbnail_loc>https://www.example.com/thumbs/345.jpg</video:thumbnail_loc>
      <video:title>Grilling steaks for winter</video:title>
      <video:description>
        In the freezing cold, Roman shows you how to get perfectly done steaks every time.
      </video:description>
      <video:content_loc>
        http://streamserver.example.com/video345.mp4
      </video:content_loc>
      <video:player_loc>
        https://www.example.com/videoplayer.php?video=345
      </video:player_loc>
    </video:video>
  </url>
</urlset>

Required Tags

<video:video>

The parent element for all information about a single video on the page specified by the <loc> tag. You can include multiple <video:video> tags nested in the <loc> tag, one for each video on the hosting page.

<video:thumbnail_loc>

A URL pointing to the video thumbnail image file. Follow the video thumbnail requirements.

<video:title>

The title of the video. All HTML entities must be escaped or wrapped in a CDATA block. We recommend that this match the video title displayed on the web page where the video is embedded.

<video:description>

A description of the video. Maximum 2048 characters. All HTML entities must be escaped or wrapped in a CDATA block. It must match the description displayed on the web page where the video is embedded, but it doesn't need to be a word-for-word match.

<video:content_loc>

A URL pointing to the actual video media file.

<video:player_loc>

A URL pointing to a player for a specific video. Usually this is the information in the src attribute of an <embed> tag.


Combining Sitemap Extensions


Sitemap extensions are a great way to tell Google about the different kinds of content and their metadata that you're using on your site. Often the content on your pages may fit into multiple kinds of extensions; for example, you might be publishing news articles that embed images and videos. Additionally, your pages may be localized as well, which might mean that you could add hreflang annotations for your localized pages.
Declaring Multiple Namespaces:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
           xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
           xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
           xmlns:xhtml="http://www.w3.org/1999/xhtml">
  <url><!-- rest of the sitemap -->

Comments

You must be logged in to post a comment!

Insert Math Markup

ESC
About Inserting Math Content
Display Style:

Embed News Content

ESC
About Embedding News Content

Embed Youtube Video

ESC
Embedding Youtube Videos

Embed TikTok Video

ESC
Embedding TikTok Videos

Embed X Post

ESC
Embedding X Posts

Embed Instagram Post

ESC
Embedding Instagram Posts

Insert Details Element

ESC

Example Output:

Summary Title
You will be able to insert content here after confirming the title of the <details> element.

Insert Table

ESC
Customization
Align:
Preview:

Insert Horizontal Rule

#000000

Preview:


Insert Chart

ESC

View Content At Different Sizes

ESC

Edit Style of Block Nodes

ESC

Edit the background color, default text color, margin, padding, and border of block nodes. Editable block nodes include paragraphs, headers, and lists.

#ffffff
#000000

Edit Selected Cells

Change the background color, vertical align, and borders of the cells in the current selection.

#ffffff
Vertical Align:
Border
#000000
Border Style:

Edit Table

ESC
Customization:
Align:

Upload Lexical State

ESC

Upload a .lexical file. If the file type matches the type of the current editor, then a preview will be shown below the file input.

Upload 3D Object

ESC

Upload Jupyter Notebook

ESC

Upload a Jupyter notebook and embed the resulting HTML in the text editor.

Insert Custom HTML

ESC

Edit Image Background Color

ESC
#ffffff

Insert Columns Layout

ESC
Column Type:

Select Code Language

ESC
Select Coding Language