Exclude URLs When Creating XML Sitemaps
You can exclude URLs by various means, e.g. response code. This works with all kinds of sitemaps including HTML and XML sitemaps.
Normally, filtering of URLs are done by the site crawler during website scan, e.g. through
output filters,
filtering session IDs in URLs and
robots.txt file, nofollow and noindex.
Depending on the program configuration, not all URLs shown in the website
tree view will be included in generated sitemaps.
You can control if exclusion related filters are after a website scan has finished or when building sitemaps:
- Older versions:
- Scan website | Crawler options | Apply "webmaster" and "output" filters after website scan stops
- Newer versions:
- Scan website | Output filters | After website scan stops: Remove URLs excluded
- Scan website | Webmaster filters | After website scan stops: Remove URLs with noindex/disallow
- And then:
- Check Create sitemap | Document options | Remove URLs excluded by "webmaster" and "output" filters
Note: You can also edit state flags of URLs such as
do not output after a website crawl has finished:
In addition to general filtering, you can also exclude URLs when building sitemap files (including HTML sitemaps and XML sitemaps) based on HTTP response codes.
Generally speaking, when using the default configuration, only URLs with a valid response code are included when building sitemaps.
There are a few specific exceptions when creating HTML sitemaps, but otherwise
all unwanted URLs are left out.
Example: URLs that redirect with e.g. response
301 : Moved Permanently
are not included when building XML sitemaps.
Which response codes the sitemap builder will accept can be set in option
Create sitemap | Document options.