Microsys
  

XML Sitemap Files in UTF-8 or ASCII Character Format

The sitemaps protocol defines that XML sitemap documents must be UTF-8 and contain no characters outside ASCII range.

ASCII is subset of UTF-8

The first 0..127 characters in UTF-8 are the same as in ASCII.


UTF-8 documents and BOM

Some UTF-8 files may start with a socalled BOM (byte order mark) to identify it as a unicode UTF-8 document file.

The BOM is not required for XML or UTF-8 documents. It just helps most unicode tools to handle the unicode text correctly. (Although ASCII only compliant document parsers may choke at it.)

The BOM for UTF-8 looks like this in hexadecimal: $EF $BB $BF. To view the BOM in XML document files such as sitemaps, you will need to use tools such as hex editors.

You can configure how the sitemap generator software creates XML sitemaps.
In Create sitemap | Document options | Character set and type you find options:
  • Always save sitemap files as UTF-8.
  • Save UTF-8 sitemap files with BOM.


URL Encode Characters not ASCII in XML Sitemaps

The sitemaps protocol defines that all non-ASCII characters are to be URL encoded even though the XML sitemap file is defined as UTF-8. That is not a problem as ASCII is a subset of UTF-8. To read more, check our article about XML sitemaps URL encoding.
A1 Sitemap Generator
A1 Sitemap Generator | help | previous | next
Build all kinds of sitemaps including text, visual HTML / CSS, RSS, XML, image, video, news and mobile for all your websites no matter the platform they use.
This help page is maintained by
As one of the lead developers, his hands have touched most of the code in the software from Microsys. If you email any questions, chances are that he will be the one answering.
Share this page with friends   LinkedIn   Twitter   Facebook   Pinterest   YouTube  
 © Copyright 1997-2024 Microsys

 Usage of this website constitutes an accept of our legal, privacy policy and cookies information.