WordPress Sitemaps with Jetpack

Jetpack can automatically produce XML sitemaps for its WordPress installation. These aren’t generated as static files but rather dynamically assembled on request, just like WordPress posts. While this feature generally works well there are some annoying caveats you should be aware of.

Submit your Sitemap

Regarding sitemaps for search engines (the only kind I’m interested in), Jetpack claims there is “no need to do anything extra on your end” after enabling its sitemap option. At least for Google Search this is not true. You must also have a Google Webmasters account and manually submit your sitemap there. Otherwise Google will simply ignore it, even though it exists and correctly shows up in robots.txt.

Media Titles & Captions

Jetpack generates a two-part sitemap, one for your posts (sitemap-1.xml) and one for your media gallery entries (image-sitemap-1.xml). The latter lists the simple post-like attachment views that WordPress automatically provides for each image. Here Jetpack adds an image element to the standard loc and lastmod elements, like so:

<url>
  <loc>https://news.kynosarges.org/2017/02/25/bnm-armor-weapons/</loc>
  <lastmod>2017-04-09T12:58:57Z</lastmod>
  <image:image>
    <image:loc>https://news.kynosarges.org/wp-content/uploads/BNM-Armor-01.jpg</image:loc>
    <image:title>BNM Armor+Weapons 01</image:title>
    <image:caption>Armored Figure</image:caption>
  </image:image>
</url>

That’s nice but there is a problem: the image’s title and caption contain plain text – and Jetpack does not escape reserved XML characters! I originally had an ampersand (&) instead of the plus sign (+) in the title. Even when I entered the XML entity &amp; in the media gallery, Jetpack would output a single & in the sitemap, making it illegal XML.

Of course Google rejects such sitemaps – Webmasters will show an error message for the first offending character. The only solution was to go through all my gallery entries that used ampersands and replace them with legal XML characters, such as plus signs. This is clearly a bug in Jetpack that you must work around, for now (version 4.8.2).

Clear your Cache

When I edited my media gallery to fix this issue, I noticed that I was not getting an updated sitemap until I explicitly turned the feature off and back on again. This should not happen since the sitemap is dynamically generated anyway. What I think was happening is that WP Super Cache kept the old sitemap around, and Jetpack failed to signal WP Super Cache that its contents had changed. Manually clearing any WordPress cache you are running should resolve the issue, although I haven’t tested that specifically.

Correcting myself immediately after posting: This new post did not show up in the sitemap despite clearing WP Super Cache. It seems Jetpack itself caches the sitemap and only updates infrequently, unless you manually toggle the feature to force a regeneration.

No More Sitemaps (2017-10-09)

After a few months, the Jetpack sitemap showed no benefit to overall visitor counts. Then, a while after another big gallery post, came a strange week where daily traffic collapsed to less than half its usual level, according to Google Analytics. Going to Google Webmaster Tools I found no relevant error messages. I did find that the sitemap (which generates one entry per gallery image!) had grown so large I couldn’t even load it myself with image previews enabled.

Tentatively I disabled the sitemap feature, and traffic rebounded to its usual levels within a day. So now I suspect that search engine bots might have been stalled by a Jetpack sitemap that was bloated by gallery images, or perhaps visitors could not get through because the server was busy serving a huge sitemap. There might be another explanation but the correlation was striking, and I had not seen any substantial benefit from the sitemap anyway.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.