About Sitemaps
Notes on sitemaps and how to best handle them for single page applications.
References
- Sitemaps.org
- W3C Datetime Format
- Tools for Validating Sitemap:
- Google Search Central: What is a Sitemap?
Notes
Sitemaps are an easy way for webmasters to inform search engines about pages on their sites that are available for crawling. In its simplest form, a Sitemap is an XML file that lists URLs for a site along with additional metadata about each URL so that search engines can more intelligently crawl the site.
Web crawlers usually discover pages from links within the site and from other sites. Sitemaps implement this data to allow crawlers that support Sitemaps to pick up URLs in the Sitemap and learn about those URLs using the associated metadata. Using the sitemap protocol does not guarantee tat web pages are included in search engines, but provides hints for web crawlers to do a better job or crawling your site.
Sitemaps XML format
- The Sitemap protocol format consists of XML tags.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<url>
<loc>http://www.example.com/</loc>
<lastmod>2005-01-01</lastmod>
<changefreq>monthly</changefreq>
<priority>0.8</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=12&desc=vacation_hawaii</loc>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.example.com/catalog?item=73&desc=vacation_new_zealand</loc>
<lastmod>2004-12-23</lastmod>
<changefreq>weekly</changefreq>
</url>
<url>
<loc>http://www.example.com/catalog?item=74&desc=vacation_newfoundland</loc>
<lastmod>2004-12-23T18:00:15+00:00</lastmod>
<priority>0.3</priority>
</url>
<url>
<loc>http://www.example.com/catalog?item=83&desc=vacation_usa</loc>
<lastmod>2004-11-23</lastmod>
</url>
</urlset>
- All tags in a Sitemap must be entity-escaped, and the file itself must be UTF-8 encoded.
Character | Escape Code | |
---|---|---|
Ampersand | & |
|
Single Quote | ' |
|
Double Quote | " |
|
Greater Than | > |
|
Less Than | < |
|
- The Sitemap must:
- Begin with an opening
<urlset>
tag and end with a closing</urlset>
tag - Specify the namespace (protocol standard( within the
<urlset>
tag - Include a
<url>
entry for each URL, as a parent XML tag - Include a
<loc>
child entry for each<url>
parent tag
- Begin with an opening
- All other tags are optional, and support for these other tags may vary along search engines.
Attribute | Description | |
---|---|---|
| required | Encapsulates the file and references the current protocol standard |
| required | Parent tag for each URL entry. The remaining tags are children of this tag. |
| optional | URL of the page. This URL must begin with the protocol (such as http) and end with a trailing slash, if your web server requires it. This value must be less than 2,048 characters. Note that URLs must be escaped - best to use something like |
| optional | The date of last modification of the page. This date should be in the W3C format (see below). This format allows you to omit the time portion if desired. |
| optional | How frequently this page is likely to change. This value provides general information to search engines and may not correlate exactly to how often they crawl the page. Valid values:
|
| optional | The priority of thus URL relative to other URLs on your site. Valid values range from 0.0 to 1.0. The default priority is 0.5. |
W3C Datetime Format
- Year:
YYYY
(e.g. 1997)
- Year and month:
YYYY-MM
(e.g. 1997-07)
- Complete date:
YYYY-MM-DD
(e.g. 1997-07-16)
- Complete date plus hours and minutes:
YYYY-MM-DDThh:mmTZD
(e.g. 1997-07-16T19:20+01:00)
- Complete date plus hours, minutes and seconds:
YYYY-MM-DDThh:mm:ss
TZD (e.g. 1997-07-16T19:20:30+01:00)
- Complete date plus hours, minutes, seconds and a decimal fraction of a second:
YYYY-MM-DDThh:mm:ss.sTZD
(e.g. 1997-07-16T19:20:30.45+01:00)
where:
- YYYY = four-digit year
- MM = two-digit month (01=January, etc.)
- DD = two-digit day of month (01 through 31)
- hh = two digits of hour (00 through 23) (am/pm NOT allowed)
- mm = two digits of minute (00 through 59)
- ss = two digits of second (00 through 59)
- s = one or more digits representing a decimal fraction of a second
- TZD = time zone designator (Z or +hh:mm or -hh:mm)
Using Sitemap Index Files
You can provide multiple Sitemap files, but each Sitemap file that you provide must have no more than 50,000 URLs and must be no larger than 50MB. If you want to list more than 50,000 URLs, you must create multiple Sitemap files. Each of these Sitemap files should be listed in a sitemap index file. Sitemap index files may not list more than 50,000 sitemaps and must be no larger than 50MB.
- The Sitemap index file must:
- Begin with an opening
<sitemapindex>
tag and end with a closing</sitemapindex>
tag - Include a
<sitemap>
entry for each Sitemap as a parent XML tag - Include a
<loc>
child entry for each<sitemap>
parent tag
- Begin with an opening
- The optional
<lastmod>
tag is also available for Sitemap index files.
Example:
<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
<sitemap>
<loc>http://www.example.com/sitemap1.xml.gz</loc>
<lastmod>2004-10-01T18:23:17+00:00</lastmod>
</sitemap>
<sitemap>
<loc>http://www.example.com/sitemap2.xml.gz</loc>
<lastmod>2005-01-01</lastmod>
</sitemap>
</sitemapindex>
Attribute | Description | |
---|---|---|
| required | Encapsulates information about all of the Sitemaps in the file |
| required | Encapsulates information about an individual Sitemap. |
| required | Identifies the location of the Sitemap. This location can be a Sitemap, an Atom file, RSS file, or a simple text file. |
| optional | Identifies the time that the corresponding Sitemap file was modified. It Does not correspond to the time that the pages listed in the Sitemap were changed. This should be in W3C Datetime format. |
The location of a Sitemap file determines the set of URLs that can be included in that Sitemap. A Sitemap file located at http://example.com/catalog/sitemap.xml can include any URLs starting with http://example.com/catalog/ but can not include URLs starting with http://example.com/images/.
[...]
It is strongly recommended that you place your Sitemap at the root directory of your web server. For example, if your web server is at example.com, then your Sitemap index file would be at http://example.com/sitemap.xml.
Once you have created the Sitemap file and placed it on your webserver, you need to inform the search engines that support this protocol of its location. You can do this by:
- submitting it to them via the search engine's submission interface
- specifying the location in your site's
robots.txt
file - Add a line like this to the file:
Sitemap: http://www.example.com/sitemap.xml
- sending an HTTP request
- Send a request using the following scheme (replace
<searchengine_URL>
with the URL provided by the search engine): <searchengine_URL>/ping?sitemap=sitemap_url
- Example:
https://google.com/ping?https://frankmbrown.net?sitemap=https://frankmbrown.net/sitemap.xml
- A successful request will return an HTTP 200 status code.
- Send a request using the following scheme (replace
What Google Says About Sitemaps
A sitemap is a file where you provide information about the pages, videos, and other files on your site, and the relationships between them. Search engines like Google read this file to crawl your site more efficiently. A sitemap tells search engines which pages and files you think are important in your site, and also provides valuable information about these files.
You can use a sitemap to provide information about specific types of content on your pages, including video, image, and news content.
Image Sitemaps
Image sitemaps are a way of telling Google about other images on your site, especially those that we might not otherwise find (such as images your site reaches with JavaScript code). You can create a separate image sitemap or add sitemap tags to your existing sitemap; either approach is equally fine for Google.
Required Tags | |
---|---|
| Encloses all information about a single image. Each |
| The URL of the image. |
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xmlns:image="http://www.google.com/schemas/sitemap-image/1.1">
<url>
<loc>https://example.com/sample1.html</loc>
<image:image>
<image:loc>https://example.com/image.jpg</image:loc>
</image:image>
<image:image>
<image:loc>https://example.com/photo.jpg</image:loc>
</image:image>
</url>
<url>
<loc>https://example.com/sample2.html</loc>
<image:image>
<image:loc>https://example.com/picture.jpg</image:loc>
</image:image>
</url>
</urlset>
News Sitemaps
If you are a news publisher, use news sitemaps to tell Google about your news articles and additional information about them. You can either extend your existing sitemap with news specific tags, or create a separate news sitemap that's reserved just for your news articles.
News Sitemap Best Practices
- Generic sitemap best practices apply to news sitemaps
- Update your news sitemap with fresh articles as they're published. Don't create a new sitemap with each update. Google News crawls news sitemaps as often as it crawls the rest of your site.
- Only include recent URLs for articles that were created in the last two days. Once the articles are older than two days, either remove those URLs from the news sitemap or remove the
<news:news>
metadata in your sitemap from the older URLs.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9">
<url>
<loc>http://www.example.org/business/article55.html</loc>
<news:news>
<news:publication>
<news:name>The Example Times</news:name>
<news:language>en</news:language>
</news:publication>
<news:publication_date>2008-12-23</news:publication_date>
<news:title>Companies A, B in Merger Talks</news:title>
</news:news>
</url>
</urlset>
Required Tags | |
---|---|
| The parent tag of other tags in the |
| The parent tag for the name and language tag. Each |
| It is the name of the news publication. It must exactly match the name as it appears on your articles on news.google.com, omitting anything in parentheses. |
| Language of your publication. use an ISO 639 language code. |
| The article publication date in W3C format. The article publication date in W3C format. Use either the "complete date" format (YYYY-MM-DD) or the "complete date plus hours, minutes, and seconds" format with time zone designator format (YYYY-MM-DDThh:mm:ssTZD). Specify the original date and time when the article was first published on your site. Don't specify the time when you added the article to your sitemap. |
| The title of the news article |
Video Sitemaps and Alternatives
A video sitemap is a sitemap with additional information about videos hosted on your pages. Creating a video sitemap is a good way to help Google find and understand the video content on your site, especially content that was recently added or that we might not otherwise discover with our usual crawling mechanisms.
Video sitemap best practices
- Video sitemaps are based on generic sitemaps
- Don't list videos that are unrelated to the content of the host page
- All files in the video sitemap must be accessible to Googlebot. This means that all URLs in the video sitemap:
- must not be disallowed for crawling by
robots.txt
rules - must be accessible without metafiles and without logging in
- must not be blocked by firewalls or similar mechanism
- and must be accessible on a supported protocol: HTTP and FTP (streaming protocols are not supported)
- must not be disallowed for crawling by
If you want to prevent spammers from accessing your video content at the <player_loc>
or <content_loc>
URLs, verify that any bots accessing your server are really Googlebot.
<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1">
<url>
<loc>https://www.example.com/videos/some_video_landing_page.html</loc>
<video:video>
<video:thumbnail_loc>https://www.example.com/thumbs/123.jpg</video:thumbnail_loc>
<video:title>Grilling steaks for summer</video:title>
<video:description>
Alkis shows you how to get perfectly done steaks every time
</video:description>
<video:content_loc>
http://streamserver.example.com/video123.mp4
</video:content_loc>
<video:player_loc>
https://www.example.com/videoplayer.php?video=123
</video:player_loc>
<video:duration>600</video:duration>
<video:expiration_date>2021-11-05T19:20:30+08:00</video:expiration_date>
<video:rating>4.2</video:rating>
<video:view_count>12345</video:view_count>
<video:publication_date>2007-11-05T19:20:30+08:00</video:publication_date>
<video:family_friendly>yes</video:family_friendly>
<video:restriction relationship="allow">IE GB US CA</video:restriction>
<video:price currency="EUR">1.99</video:price>
<video:requires_subscription>yes</video:requires_subscription>
<video:uploader
info="https://www.example.com/users/grillymcgrillerson">GrillyMcGrillerson
</video:uploader>
<video:live>no</video:live>
</video:video>
<video:video>
<video:thumbnail_loc>https://www.example.com/thumbs/345.jpg</video:thumbnail_loc>
<video:title>Grilling steaks for winter</video:title>
<video:description>
In the freezing cold, Roman shows you how to get perfectly done steaks every time.
</video:description>
<video:content_loc>
http://streamserver.example.com/video345.mp4
</video:content_loc>
<video:player_loc>
https://www.example.com/videoplayer.php?video=345
</video:player_loc>
</video:video>
</url>
</urlset>
Required Tags | |
---|---|
| The parent element for all information about a single video on the page specified by the |
| A URL pointing to the video thumbnail image file. Follow the video thumbnail requirements. |
| The title of the video. All HTML entities must be escaped or wrapped in a CDATA block. We recommend that this match the video title displayed on the web page where the video is embedded. |
| A description of the video. Maximum 2048 characters. All HTML entities must be escaped or wrapped in a CDATA block. |
| A URL pointing to the actual video media file. |
| A URL pointing to a player for a specific video. Usually this is the information in the |
Combining Sitemap Extensions
Sitemap extensions are a great way to tell Google about the different kinds of content and their metadata that you're using on your site. Often the content on your pages may fit into multiple kinds of extensions; for example, you might be publishing news articles that embed images and videos. Additionally, your pages may be localized as well, which might mean that you could add hreflang
annotations for your localized pages.
Declaring Multiple Namespaces:
<?xml version="1.0" encoding="UTF-8"?><urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9"
xmlns:news="http://www.google.com/schemas/sitemap-news/0.9"
xmlns:video="http://www.google.com/schemas/sitemap-video/1.1"
xmlns:xhtml="http://www.w3.org/1999/xhtml">
<url><!-- rest of the sitemap -->
Comments
You have to be logged in to add a comment
User Comments
There are currently no comments for this article.