language | en us | UTF8 program | website sitemaps

en-us
English (US)
Thomas Schulz
support#microsystools.com
https://www.microsystools.com
Please email suggestions and/or corrections.
2020-10-25 23:34:49
A1 Sitemap Generator
10.1.4

Scan was stopped:<NL><NL>Number of max internal URLs encountered is limited to {dynCount} in "{dynEdition}".
Can not execute command: {dynCommand}<NL><NL>This functionality is not available in "{dynEdition}".
Scan was stopped:<NL><NL>Number of max URLs analyzed is limited to {dynCount} in "{dynEdition}".
Not all pages were included:<NL><NL>Number of max pages in generated search engines is limited to {dynCount} in "{dynEdition}".
Not all pages were included:<NL><NL>Number of max pages in generated sitemaps is limited to {dynCount} in "{dynEdition}".
You can not load projects in "{dynEdition}".<NL><NL>A new project will be created instead.
Not all content was imported:<NL><NL>Number of max lines imported is limited to {dynCount} in "{dynEdition}".
Not all content was exported:<NL><NL>Number of max lines exported is limited to {dynCount} in "{dynEdition}".
Not entire list was used:<NL><NL>Number of max keywords and phrases in the source keyword list is limited to {dynCount} in "{dynEdition}".
The {dynSitemapKind} file could not be created:<NL><NL>Only standard text, RSS feed, HTML template and XML sitemap files are supported in "{dynEdition}".
The {dynSitemapKind} file could not be created:<NL><NL>Only standard XML sitemap files are supported in "{dynEdition}".
To use this function you need to configure option: <NL>{dynOption}
Title
Description
Count {dynCount}
At lines
With anchors
With "alt" attribute
With follow
Weight {dynCount}
Merged {dynCount}
{dynEngine} # related
{dynEngine} # searched
{dynEngine} # score
Crunching data. Can take time!
Checking the website root address we found a possible problem that can affect website crawling:
The response "{dynResponse}" indicates that the computer running A1 Sitemap Generator has not enabled TLS 1.1 / TLS 1.2. This is now often required for HTTPS.
In "Scan website | Crawler engine" switch to "HTTP using Indy". Then refer to the online help if you have problems getting support for HTTPS websites working.
When using Windows 7 make sure you have updated to SP1 or newer. Enable TLS 1.1 and TLS 1.2 in Windows/IE internet settings at "Tools - Internet Options - Advanced - Security"
The response "{dynResponse}" indicates connection was blocked - usual causes include internet traffic filtering / firewall software (make sure A1 Sitemap Generator is allowed), DNS/domain issues (check for typing errors) or SSL/TLS configuration.
The response "{dynResponse}" indicates the webserver blocked access - usual solutions include lowering simultaneous connections count and changing the user agent ID inA1 Sitemap Generator.
The website requested the page to be ignored - check how you are using "robots.txt" and "noindex" in HTML source and HTTP headers or configure A1 Sitemap Generator to ignore those.
Link
Source
Direct
Unknown
Linked from {dynCount} pages
Links to {dynCount} pages
Used from {dynCount} pages
Uses {dynCount} URLs
Located {dynCount}
Description
Response code
Kind
Mime
Charset
special
Referenced from
Redirected from
Located {dynCount}
Phrase
Website
Search engine
Weight
Count
Count
no stop words used
auto detect language
Page
Site
Builtin
Usermade
Weighted % keyword density match in content
Top words and phrases in weighted keyword density
Google page backlinks
Google counted backlinks to URL
Title in page source
HTML errors
W3C HTML errors
Title in search engine
Title in search engine result page
Description in search engine
Description in search engine result page
Search results in search engine
Google indexed
Google indexed URLs count
Some services and search engines have terms, policies and measures against non-manual position checking and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services and search engines.<NL><NL>All search engines are checked concurrently. This means that the "idle" time only have moderate effect on the overall speed.<NL><NL>It is possible to configure existing (e.g. further increasing time between requests) and add new services you want to check.<NL><NL>Position check now?
Some services and search engines have terms, policies and measures against non-manual position checking and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services and search engines.<NL><NL>It is possible to configure existing (e.g. further increasing time between requests) and add new services you want to check.<NL><NL>Analyze top positions on selected search engine now?
Some services and search engines have terms, policies and measures against non-manual usage and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services and search engines.<NL><NL>Run analysis now?
Some services and search engines have terms, policies and measures against non-manual usage and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services and search engines.<NL><NL>All services are checked concurrently. This means that the "idle" time only have moderate effect on the overall speed.<NL><NL>It is possible to configure existing (e.g. further increasing time between requests) and add new services you want to check.<NL><NL>Retrieve suggestions now?
You have selected to integrate with an external online service during website scan.<NL><NL>Some services have terms, policies and measures against non-manual usage and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services.<NL><NL>Continue?
You have selected to integrate with an external online service during website scan:"{dynOnlineService}"<NL><NL>Some services have terms, policies and measures against non-manual usage and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services.<NL><NL>Keep setting?
Some services and search engines have terms, policies and measures against non-manual usage and website scraping.<NL><NL>Consider getting legal advice concerning if your intended usage is considered "fair use" in your jurisdiction.<NL><NL>This program enforces some measures (e.g. idle time between requests) to lessen the "load" on such services and search engines.<NL><NL>The process can take some time before it finishes.<NL><NL>Continue?
This file is named with another convention than expected.<NL><NL>Normal name pattern is "{dynDirFilePathName}".<NL><NL>Use this file?
You need to select or enter a web page URL into "Active address".
You need to select or enter a keyword phrase into "Active keyword phrase".
You need to enter one or more URLs into "Addresses to find".
You need to enter one or more keywords into "Phrases to check".
You need to perform a website scan first.<NL><NL>When you save and load projects, this also includes all data related to and fetched from website scans.
The list of imported URLs have been placed in the "{tabExternal}" tab.<NL><NL>Do you want to enable the following options:<NL>- {optionRecrawl}<NL>- {optionExternal}<NL><NL>These are useful if you are planning on running a scan to check all the imported URLs.
You need to crawl the website with appropriate settings first.<NL><NL>In "{dynScanWebsite}" - "{dynQuickPresets}" select "{dynQuickSelect}".<NL><NL>Then start a new website scan.<NL><NL><NL>Note: Also make sure you are not blocking media from crawlers, e.g. through a "http://example.com/robots.txt" file on server.
You can usually get much better results if you crawl the website with appropriate settings first.<NL><NL>In "{dynScanWebsite}" - "{dynQuickPresets}" select "{dynQuickSelet}".<NL><NL>Then start a new website scan.<NL><NL><NL>Note: Also make sure you are not blocking media from crawlers, e.g. through a "http://example.com/robots.txt" file on server.
There was a problem accessing file at location "{dynDirFilePathName}". Sitemap file will not be created.<NL><NL>To resolve problem, try build the sitemap again.
The directory part of the sitemap output file path "{dynDirFilePathName}" does not exist.<NL><NL>Attempt to create directory now?
Could not create "{dynDirFilePathName}".<NL><NL>This could be due to missing or insufficient write access permission.
Path of app data files has to be local to app when in debug mode and saving sitemap templates
All template names starting with "a1sg-" are reserved for default templates. Please choose another name for your sitemap template. Then try save again!
Root directory path "{dynDirFilePathName}" needs to end with a slash. Use entered path?<NL><NL>Click "Yes" to add a trailing slash to the root directory path entered.<NL>Click "No" to have the path added as a "start scan path" and afterwards cutoff to nearest directory.
It seems the "root path" chosen is not listed in the "root path aliases" list.<NL><NL>Remove the old "alias" values?
You have enabled option:<NL>"{dynOptionPath1}"<NL><NL>However, with this enabled, you also need to fill valid path into:<NL>"{dynOptionPath2}"<NL><NL>Disable option:<NL>"{dynOptionPath1}"<NL>?
You have enabled option:<NL>"{dynOptionPath1}"<NL><NL>However, with this enabled, you also need to fill valid path into:<NL>"{dynOptionPath2}"<NL><NL>Disable option:<NL>"{dynOptionPath1}"<NL>?
You have enabled option:<NL>"{dynOptionPath1}"<NL><NL>However, with this enabled, you also need to configure either:<NL>{dynOptionPath2}<NL> or <NL>{dynOptionPath3}<NL> or <NL>{dynOptionPath4}<NL><NL>Disable option:<NL>"{dynOptionPath1}"<NL>?
Add word "{dynOptionWord}" to dictionary file "{dynOptionPath}"?
{dynCount} characters. Text percentage: {dynPercent}% of characters.
Analyzed / Found
URLs
Sitemap
Internal
External
Tip: Control the search index size (quality) in the "Analyze text content" tab, section "site analysis"
Edit Cell
&OK
&Cancel

New project
Open project...
Open project
Ctrl+O
Save project
This will save both crawl data and project configuration
Ctrl+S
Save project as...
This will save both crawl data and project configuration
Save project configuration (only) as...
This will only save project configuration
Update file
Update file
Printer setup...
Print
Print
Ctrl+P
Exit
Export selected data to file...
Export data in selected control into a file
Export selected data to clipboard
Export data in selected control into clipboard
Import URLs and data from file using "smart" mode...
Import a list of URLs and data where program detects if most of the URLs are all internal and belong to the same domain
Import URLs and data from file into "external" list...
Import a list of URLs and data where program assumes that most of the URLs are from different domains and belong in "external"
Cut
Cut selected text
Ctrl+X
Copy
Copy selected text
Ctrl+C
Copy path (URL)
Copy selected path (usually URL)
Paste
Paste text from clipboard
Ctrl+V
Delete
Delete selected text
Ctrl+Del
Select all
Select all text in active control
Ctrl+A
Undo
Undo last change
Ctrl+Z
Search...
Find text
Ctrl+F
Find next
Find next
F3
Replace text...
Replace text
Ctrl+R
Toggle word wrap
Add item
Add item after selected
Ctrl+Alt+R
Insert item
Insert row before selected
Ctrl+Alt+I
Add child item
Add child in selected
Ctrl+Alt+C
Delete item
Delete selected item
Ctrl+Alt+E
Add selected paths minus sub-paths to "limit-to" in both "output" and "analysis" filters
Add selected paths plus sub-paths to "limit-to" in both "output" and "analysis" filters
Add selected paths plus sub-paths to "limit-to" in both "output" and "analysis" filters
Add selected paths minus sub-paths to "exclude" in both "output" and "analysis" filters
Ctrl+Alt+F
Add selected-paths plus sub-paths to "exclude" in both "output" and "analysis" filters
Ctrl+Alt+F
Add selected-paths plus sub-paths to "exclude" in "analysis" filters
Move item up
Move selected item up
Ctrl+Alt+U
Move item down
Move selected item down
Ctrl+Alt+D
Sort
Sort
Expand all
Collapse all
Prior
View prior page
Shift+Ctrl+Z
Next
View next page
Shift+Ctrl+X
Refresh
Except when editing source, this clears the cache and refreshes the content
F5
Navigate viewer
Associate "Prior, Next and Refresh" with embedded control (e.g. Internet Explorer window)
Ctrl+Alt+N
Found items
Path on disk
Response code
Response code description
URL content state flags
Crawler engine state flags
Response time
Download time
File size
Word count
Text content percentage
MIME type
Charset of page URL
Language code
Last modified
Click-to navigation count
Linked to by count
Linked to by anchors
Linked to by list
Used by alt text
Used as alternate hreflang
Used by list
Redirected to by list
Links to internal count
Links to external count
Uses internal list
Uses internal as alternate hreflang
Uses external list
Uses external as alternate hreflang
Redirects count
Redirects to path
Redirects to response code
Redirects to path (final)
Redirects to response code (final)
Importance score calculated
Importance score scaled
HTML validation errors
CSS validation errors
Spelling validation errors
Indexed in search engines
Page content search results
Page title
Page description
Page keywords
Page H1
Page H2
Title characters count
Title pixels width
Description characters count
Description pixels width
Page content keywords (weighted density)
Page content "fuzzy duplicates" (visual view)
Page content checksum
Page content "fuzzy quality"
Screenshot of page
Impressions
Clicks
CTR
Backlinks score
Internet Explorer
Opera
FireFox
Chrome
Safari
Show URLs as "tree" versus as "list"
Switch between if found website items are shown in "tree" or "list" format
Data filter
When "pressed" filter options and filter text are applied to limit the URLs and data visible
Simplified easy mode
Toggles between "easy" and "advanced" configuration mode
Reset to standard: Match filter-text against content in all visible columns
Reverse results of all filtering (visible becomes invisible and vice versa)
Only show URLs with duplicate titles
Only show URLs with duplicate title when applying filter
Only show URLs with duplicate descriptions
Only show URLs with duplicate META description when applying filter
Only show URLs with duplicate keywords
Only show URLs with duplicate META keywords when applying filter
Only show URLs with duplicate H1 tags
Only show URLs with duplicate H1 title when applying filter
Only show URLs with duplicate H2 tags
Only show URLs with duplicate H2 title when applying filter
Only show URLs with "file size" > filter-text-as-number
Only show URLs with any filter-text-number found in "response code" column
Apply "quick filter" on response code related columns - example: 301 302 307
Only show URLs with all [filter-text] found in "URL state flags" column
Apply "quick filter" on "URL state flag" column - example: "+[noindex] -[canonicalredirect]"
Only show URLs with "file size" < filter-text-as-number
Only show URLs with "title characters count" > filter-text-number
Show URLs as stored in data
Only show URL pages with "word count" < filter-text-as-number
Only show URLs with "H1 characters count" < filter-text-number
Only show URL pages with "word count" > filter-text-as-number
Only show URLs with "H2 characters count" < filter-text-number
Only show URLs with "title characters count" < filter-text-number
Only show URLs with "title pixels count" > filter-text-number
Only show URLs with "description characters count" < filter-text-number
Only show URLs with "description characters count" > filter-text-number
Only show URLs with "description pixels count" > filter-text-number
Only show URLs that are images
Only show URLs that are pages
Only show URLs where "linked-by" or "used-by" miss anchors / "alt"
Only show URLs where "used-by" "alternate-hreflang" conflicts
Show URLs using real characters
Show URLs percentage encoded %
Allow big URL lists in data columns
An URL can have many pages that link to making the user interface hard to use. This option is mostly used when exporting data.
Allow relative paths inside URL lists in data columns
When showing lists of URLs in data columns you can have internal URLs use relative paths
Only show page URLs inside URL lists in data columns
Text filter uses regex syntax
Show only URLs with "not found" (404) - and relevant data columns
Show only URLs with "redirect" (301 and 302) - and relevant data columns
Show only URLs with "canonical" to other - and relevant data columns
Show only pages with duplicate titles - and relevant data columns
Show only pages with duplicate H1s - and relevant data columns
Show only pages with duplicate H2s - and relevant data columns
Show only pages with duplicate descriptions - and relevant data columns
Show only pages with "noindex" - and relevant data columns
Show only URLs robots.txt disallowed or noindexed - and relevant data columns
Show only pages with titles over 70 characters - and relevant data columns
Show only pages with titles over 510 pixels (Google) - and relevant data columns
Show only pages - and most important SEO data columns
Show only pages - and importance / link juice flow data columns
Show only images where some "linked-by" or "used-by" miss anchors / "alt"
Show only pages where "used by" "alternate hreflang" have conflicts
Show only pages - and content related data columns
Format and strip whitespaces
Remove HTML comments
Syntax highlight document
Recalculate importance data
Recalculate summary data (extended)
Recalculate URLs "excluded" from "Create robots.txt" file
Spell check document
F7
Add word to "xxx--custom-user.dic" (global)
Ctrl+Alt+G
Add word to "xxx--custom-user-project.dic" (project)
Ctrl+Alt+P
Open and show data
Calculate summary data (extended)
Always open created sitemaps
Fast storage of analysis data
Suggested unless you wish to view/edit XML files
Include extended analysis data
Useful if you later want to load project and see data retrieved during website crawl
Limit analysis data to "sitemap"
Useful if you are only interested in files within "sitemap"
Links "reduce"
If link to same URL is found multiple times on same page, "link juice" gets decreased for each link
Links "noself"
If a page contains links to itself, those links are ignored
Title Font...
Description Font...
Reset to defaults
CSV: Export data with column headers
CSV: Export with website URL at top
CSV: Wrap data cells with line breaks in ""
Wrap line breaks in "" instead of replacing them with whitespace
Export unicode with BOM
Export as UTF-8 (unicode)
Export as UCS-2 / UTF-16 LE (unicode)
Export as ANSI (local codepage)
Readme...
View the readme file
Program log
View the program log
Online help
View the online help
F1
Offline help
View the offline help provided
Tips...
View tips
Updates...
Check for updates
Buy program now...
Buy the program (opens online purchase page)
Unlock program...
Unlock the software using the instructions you received after purchasing
Send feedback
Website
Like and share
Like and share program on social websites such as Facebook, Google+ and Twitter
Newsletter
Subscribe to our newsletter
Support forums
About...
A1 Sitemap Generator
Homepage for sitemap generator tool
A1 Website Analyzer
Homepage for website analyzer tool
A1 Keyword Research
Homepage for keyword research tool
A1 Website Search Engine
Homepage for website search engine tool
A1 Website Download
Homepage for website download tool
A1 Website Scraper
Homepage for website scraper tool

Structure
Styles
Restore default settings
Google video sitemap
Google video sitemap (website has videos hosted externally)
Google image sitemap
Google image sitemap (website has images hosted externally)
Google mobile sitemap
Google code sitemap
WordPress CMS/blog website
Joomla CMS website (with crawler throttling)
Magento CMS website
phpBB forum website
SMF forum website
VBulletin forum website
XenForo forum website
SharePoint website
Wiki website
Clear file types in "output filters" (download)
Clear file types in "output filters"
Add image files
Add video files
Add audio files
Add compressed files
Add executable files
Add document files
Add common "safety" filters for use with login
Restore defaults
HTML template sitemap : HTML
HTML template sitemap : CSV
XML Sitemap : XML Sitemaps Protocol (Google Sitemaps)
XML Sitemap : Google News Sitemaps
Restore defaults
example.com
Change window size
Change window style
Data reports (configures settings, columns and filters)
Data URL encoding
Data columns
Data filter options
Core data
URL references
Importance ranks
Validation results
Extracted content
Imported data
Enable all
Disable all
Enable all
Disable all
Enable all
Disable all
Enable all
Disable all
Enable all
Disable all
Enable all
Disable all
&File
&Edit
&Table
&View
T&ools
O&ptions
&Help
Reopen Project
Program options
Project options
After website scan
After sitemap build
Save and load
URL importance algorithm
Font Pixel Calculations
Data import/export
To adjust the filter: Check an option from one or more of the underneath groups
To apply the filter: Click the "funnel" button, so it appears "pressed down" active

Scan website
Analyze website data
Keyword tools
Online tools
Search engine builder
Create sitemap
Create robots.txt
View files
Upload files
Ping sitemap
View website
General options and tools
Project info
tsDevCtrls
tsDevCtrls2
tsDevCtrls3
Recrawl (full)
Enable when you want to "recrawl" already found and new website pages (uses current website scan data if any)
Recrawl (listed + redirects)
Enable when you only want to "recrawl" existing found URLs and those redirected to
Resume (full)
For complete "resume" website crawl (uses current website data if any)
Resume (fix errors)
For error fix "resume" website crawl (uses current website data if any)
Start scan
Stop scan
Quick presets...
View various configuration examples (can speed up creating new projects)
Paths
Scan progress
Download options
Crawler options
Crawler engine
Crawler login
Webmaster filters
Analysis filters
Output filters
Data collection
Default path type and handler
If the crawler encounters an unknown link / reference "context" it falls back to default
HTTP using "Indy" library - uses "OpenSSL" for HTTPS - see "General options"
HTTP using Windows API WinInet - uses Windows internet settings when applicable
HTTP using Windows API + embeddable system browser - supports AJAX (content fetched after initial page load)
HTTP using Mac OS API + embeddable system browser - supports AJAX (content fetched after initial page load)
HTTP using Mac OS API
Local disk / UNC / Local Area Network
Auto detect
Auto detection is based on root path type
HTTP proxy settings (if supported depends on what functionality is used and which "crawler engine" is selected)
In most cases, you can ignore proxy settings.
DNS name / IP address
Port
Username
Password
Number of 1/1000 seconds "crawl-delay" between active connections
Number of 1/1000 seconds to wait before "connect" times out
Number of 1/1000 seconds to wait before "read" times out
Number of 1/1000 seconds between failed and new connection attempts
Number of max connection attempts to a resource before giving up
Default to GET for requests where content will be analyzed (instead of HEAD followed by GET)
Depending on website and webserver there may be a performance difference between the two choices
Default to persistent connections
Persistent connections can be an advantage with webservers that dislike many connects/disconnects
Accept-Language HTTP request header to send (if left empty, none is sent)
Additional HTTP request headers to send (if left empty, none is sent)
Store redirects, links from and to all pages, statistics etc. data
Used for viewing where files are linked, redirected etc. from. This data is also used to improve various calculations
Store additional details (e.g. which line in URL content a link is placed)
Used for viewing extensive details
Store found external URLs
It can sometimes be useful to view found "external" links
Perform keyword analysis of all pages
Inspect URLs to detect language if necessary to identify
Store titles for all pages
Store "meta" description for all pages
Store "meta" keywords for all pages
Store H1 and H2 for all pages
Store anchor text for all links
Store "alt" attribute of all "uses" (e.g. "img" HTML tag)
Store and use "fallback" tags for title and description if necessary
Examples of "fallback" data include "og:description" meta tag
Website domain address and/or root directory path
<#32>e.g. "http://example.com/" or "http://example.com/directory/". In most cases, this is the only field you will need to fill
Session, user and custom variables in links
Session variables can sometimes be in urls: "demo.php;sid=xxx" or "demo.php?PHPSESSID=xxx". Checks are done case sensitive
Directory index file names
Some websites use duplicate URLs such as "http://example.com/dir/" and "http://example.com/dir/index.html" without redirects
Consider internal file paths and URLs case sensitive
If you know the host is running Windows, you may want this unchecked
Crawl error pages (e.g. response code 404)
This can be useful in some rare cases, typical with content management systems and alike
Verify external URLs exist (and analyze if applicable)
Verifying (and possibly analzying) external links can slow the website scan process if there are many dead links
Follow links (if not otherwise instructed)
Determines if crawler will follow links by default (you may not want that if you already seeded the crawler with a list)
Follow redirects
Determines if crawler will follow redirects and crawl URLs redirected to
Consider 0 second meta refresh for a redirect
If a page contains a 0 second meta refresh to a different URL, should it be considered a kind of redirect
Allow cookies
Allow GZip/deflate for data transfers
Max simultaneous connections (data transfer)
More simultaneous connections are not always faster. The connection load and speed between you and the server is important
Max worker threads (transfer, analysis etc.)
This can be a (much) higher number than simultaneous connections
Advanced engine settings
Tracking and storage of extended website data (uncheck for large sites)
Saving extended website data increases memory usage and can hurt crawler performance
W3C online validation integration (slows website scan)
Spell check and dictionary files
"Edit" button is only enabled for "xxx--custom-user.dic" files
Enable spell checker (select dictionary files in "General Options | Tool Paths")
Remember to select dictionary files in "General options" - "Tool paths"
Search custom strings, code and text patterns in pages using regular expressions (case is ignored)
All entries must have format "name-id=regular expression"
Selected regular expression: Capture and store what was found and "matched"
Selected regular expression: Strip HTML and other code before searching
Check where URLs are ranked in selected search engine (slows website scan)
Search for page titles in a search engine. Special values: 0=not-checked (e.g. images), 250=max-value, 251=not-found, 255=error
Enable checking which URLs are indexed
User agent ID
Some websites may return different content depending on crawler / user agent.
Primary
For rare situations where a random user agent ids should be used.
Secondary (normally not used)
Windows browser login before crawl (WinInet HTTP crawler engine only)
Supports "basic authentication". Remember to allow cookies.
Open embedded browser and login before crawl
Crawler login: *basic authentication* and similar (Indy HTTP crawler engine only)
Supports "basic authentication". Remember to allow cookies.
Crawler login: Standard "Post" form
Send HTTP headers on each request
User
Password
Login parameters such as "user" and "password" to send (post form)
Login path (post form)
Crawler modes and options
Create log file of website scans (slows website crawl drastically)
Placed in program user data directory "logs - misc"
Store non-HTTP references in "external" (e.g. "ftp:" and "news:")
Store reponse headers text for all pages
Store response content text for all pages
Store screenshots of all pages with widths 480/720/1080 and only if the selected crawler engine is browser based
Validate HTML using W3C validator (select a number above 0 to enable)
Set the maximum amount of simultaneous connections to use with this tool
Validate CSS using W3C validator (select a number above 0 to enable)
Set the maximum amount of simultaneous connections to use with this tool
Search all "link" and "source" tag types
Extend search to include: <style> <img src="">, <script src="">, <link href=""> etc.
Try search all <form> and related tags (problematic with forms that generate many unique URLs)
Extend search to include combinations of: <form>, <input>, <select> etc.
Always scan directories that contain linked URLs or files
This setting ensures that directories are always scanned (even if they themselves have no direct links)
Automatically adjust the crawler user agent ID if blocked by the webserver (e.g. HTTP response 403)
Some webservers automatically block unknown website crawlers through response code "403 - forbidden"
Fix "internal" URLs with default port number explicitly defined
Example: If scanning "http://example.com/", the website crawler engine will also accept "http://example.com:80/" as internal
Fix "internal" URLs with "www." or "/" incorrect compared to website root
Example: If scanning "http://example.com/", the website crawler engine will also accept "http://www.example.com" as internal
Fix "internal" URLs with protocol incorrect compared to website root
Example: If scanning "http://example.com/", the website crawler engine will also accept "https://www.example.com/" as internal
Fix "internal" URLs if website root URL redirects to a different address
Example: If scanning "http://example.com/", the website crawler engine will also accept "https://www.example.com/" as internal
Fix URL "mime" types when server is returning obvious wrong data
Example: If server returns mime type "text/html" for URLs like "example.jpg" and "example.png"
Fix URL redirect response codes if the URL redirects to itself
Ensure URLs get percentage decoded before further processing
If you want to keep the original URLs, even if they use non-standard percentage encoding, uncheck both encode and decode options
Ensure URL "path" component is percentage encoded
Ensure URL "query" component is percentage encoded
Try search inside "FlashVars"
Handle "FlashVars" URLs relative to page *or* Flash player URL location
Try search inside Javascript (both embedded and in separate files)
This will attempt to find text and links in all script sections and files
Try search inside JSON
This will attempt to find text and links in JSON files and content
Try search inside Flash (see "General options" - "Tool paths")
This will attempt to find and guess links inside Flash files
Try search inside PDF (see "General options" - "Tool paths")
This will attempt to find and guess links inside PDF files
Try search inside all XML content (and not just XHTML)
Will try extract URLs from e.g. XML sitemap files and RSS feeds
Try search for video content in links and related URLs
Can often be useful to increase discovery of video content
Use special "response" codes for when page URLs use canonical or similar
For convenience, "special" codes can be assigned if URLs use e.g. canonical, meta refresh and other similar states
After website scan stops: Remove URLs excluded by "output filters" instead of tagging them
Strips URLs blocked by "output filters"
After website scan stops: Remove URLs with noindex/disallow instead of tagging them
Strips URLs blocked by "robots.txt" and "noindex"
Download "robots.txt"
Always download "robots.txt" to identify as a crawler/robot
Obey "robots.txt" file "disallow" directives for "*" any user agent
The file "robots.txt" is often used by webmasters to "guide" all crawlers/robots
Obey "robots.txt" file "disallow" directives specificly for *this* program
This file is often used by webmasters to "guide" specific crawlers/robots
Obey "robots.txt" file "crawl-delay" directive
Obey "meta" tag "robots" noindex
Obey "meta" tag "robots" nofollow
Obey "a" tag "rel" nofollow
Obey "link" tag "rel" canonical
Search for and add found "robots.txt" file to scan results
Search for and add found "sitemap protocol" files to scan results
Ignore URLs that "repeat parts of themselves" in generated links
Ignore URLs like "http://example.com/?paramA=&amp;paramA" and "http://example.com/0/0/0" (i.e. URLs that repeat themselves)
Max characters in internal links
Cutout the following variables (e.g. session related) in internal URLs and links
Consider non-redirected index file name URLs as "duplicates"
Consider non-redirected with-slash and non-slash URLs as "duplicates"
The "real" URL of the two is determined by which is linked the most in the website
Cutout "?" (GET parameters) in internal links
Removes "?" in links and thereby also determines if "page.htm?A=1" and "page.htm?A=2" are considered to be "page.htm"
Cutout "#" (address within page) in internal links
Determines if "page.htm#A" and "page.htm#B" are considered to be the same page
Cutout "#" (address within page) in external links
Determines if "page.htm#A" and "page.htm#B" are considered to be the same page
Correct "\" when used instead of "/" in internal links (only applied in HTTP scan mode)
Corrects e.g. "folder\sub" to "folder/sub" in all links (only applied in HTTP scan mode)
Correct "//" when used instead of "/" in internal links
Corrects e.g. "folder//sub" to "folder/sub" in all links
Correct links that start with "www." by adding a HTTP protocol in front
Consider <iframe src="example.html"> for a "link" instead of a "source"
<iframe> is always considered "source". However, sometimes also as "link" can be useful
Root path aliases
Used to cover http/https/www variations and addresses mirroring / pointing to the same content
Limit output of URLs to those with "MIME content type" in list
List will also be done if no MIME type returned
Limit output of URLs to which match as "relative path" OR "text" OR "regex" in list
Text/string matches: "mypics". Path relative to root: ":mypics/", subpaths only: ":mypics/*". Regex search: "::mypics[0-9]*/".
Limit output of internal URLs to those below depth level
Depth level: "-1" = no limits. "0" = root domain/directory. "1", "2", "3" ... = all paths below chosen directory depth level.
Website directory path depth level
Exclude output of URLs that match as "relative path" OR "text" OR "regex" in list
Text/string matches: "mypics". Path relative to root: ":mypics/", subpaths only: ":mypics/*". Regex search: "::mypics[0-9]*/".
Beyond website root path, initiate scanning from paths
Useful in cases where the site is not crosslinked, or if "root" directory is different from e.g. "index.html"
Import a list of URLs from a text file to use as additional start search paths. (To check external links use "File - Import")
Import a list of URLs from a website page URL (To check external links use "File - Import")
Add a list of common "feeder" URLs like XML sitemaps and RSS feeds
Import a list of URLs from Google "site:example.com" query
Import a list of URLs from Bing "site:example.com" query
General settings for options in "output filters"
Limit output of internal URLs to response codes
Save files crawled to disk directory path
Remove files in destination folder before website download begins
To avoid accidental deletions this option is only enabled for certain temporary directories
Configure and test what to scrape from pages during website scan
Miscellaneous options
General scraping options
Extract data into CSV file at path
Extract data into XML file at path
Extract data into RSS file at path
Extract data into SQL file at path (using selected SQL format)
Execute command line after scrape. MySQL import example: "c:\mysql\mysql.exe" -u root dbname < "c:\scraped.sql"
SQL syntax and format
MySQL
MS SQL
Define regular expressions (how to extract)
Load raw input (to test scraping works)
Define xpath expressions (how to extract)
Insert HTML source from URL
Sync output options with regex
Define output data format (CSV)
Insert various test examples
Scrape "Test input" to "Test output"
Regex test tool
Scan data
Scan state :
Time used :
Samples of internal "sitemap" URLs with content analyzed
Internal "sitemap" URLs
Listed found :
Listed deduced :
Analyzed content :
Analyzed references :
External URLs
Listed found :
Jobs waiting in crawler engine
"Init" found link (check if unique) :
"Analyze" found URL (or consider it) :
Jobs done in crawler engine
"Init" found link (check if unique) :
"Analyze" found URL (or consider it) :
Limit page analysis of URLs to those with "MIME content type" in list
Analysis will also be done if no MIME type returned
Limit page analysis of URLs to those with "file extension" in list
Directories are always analyzed
Case sensitive comparisons
Determines if all filters are case sensitive (e.g. if ".extension" also matches ".EXTENSION")
Limit output of URLs to those with "file extension" in list
Leave empty to include all file extensions
Case sensitive comparisons
Determines if all filters are case sensitive (e.g. if ".extension" also matches ".EXTENSION")
Always accept URLs with no file extensions (e.g. directories)
Limit analysis of URLs to which match as "relative path" OR "text" OR "regex" in list
Text/string matches: "mypics". Path relative to root: ":mypics/", subpaths only: ":mypics/*". Regex search: "::mypics[0-9]*/".
Limit analysis of internal URLs to those below depth level
Depth level: "-1" = no limits. "0" = root domain/directory. "1", "2", "3" ... = all paths below chosen directory depth level.
Website directory path depth level
Limit analysis of content of URLs to equal/below byte size
Use this to avoid parsing page URLs hundreds of megabytes large (0 = no limit on size)
Only analyze content in page URLs up/equal to byte size
General settings for options in "analysis filters"
Exclude analysis of URLs that match as "relative path" OR "text" OR "regex" in list
Text/string matches: "mypics". Path relative to root: ":mypics/", subpaths only: ":mypics/*". Regex search: "::mypics[0-9]*/".
Webmaster crawler filters
Website "crawler traps" detection
Website links structure
Path
Full or relative URL path
Download Path
Path on disk (only shown for paths that had to be converted during download)
Items
Found items
Response Code
HTTP response code
Response Desc
HTTP response code description
URL Flags
URL and content state "flags" detected for URL
Crawler Flags
Crawler engine state "flags" for URL
Response Time
Response time (miliseconds)
Download Time
Download time (miliseconds)
Size
Size (bytes) of file/page
Word Count
Word count
Text content %
How much of the page content is from text
MIME
MIME content type
Charset
Character set and encoding
Language
Language and culture code
Last Modified
Last modified date/time returned through HTTP header or meta tag
Click.Nav.Count
Clicks it takes from website root to reach page (i.e. navigation lengh count)
LinkedBy.Count
Incoming links count found within website
LinkedBy.Anchors
Incoming anchor text links from within the website
UsedBy.Alt
Incoming alt text on uses from within the website
Used.HrefLang
Alternate hreflang values on uses from within the website
LinkedBy.List
Incoming links list from within the website
UsedBy.List
Incoming uses list from within the website
RedirectedBy.List
Incoming redirects list from within the website
Links.Intern.Count
Internal links in page content
Links.Extern.Count
Outgoing external links in page content
Uses.Intern.HrefLang
Alternate hreflang values on uses from within the website
Uses.Intern.List
Uses list from within the website
Uses.Extern.HrefLang
Alternate hreflang values on uses from within the website
Uses.Extern.List
Uses list from within the website
Content Quality
Attempts to calculate page quality based on a number of factors
Content Similarity
Shows most important "elements" visually. Sorting uses all available data to score and group pages with similar content.
Content Checksum
Checksum hash calculated on content using 32bit FNV-1a. If sum is *0*: Rehash using content length. If sum is *0* use UInt32-Max
Page.Render.Thumb
page keywords
Redirects.Chain.Count
How many redirects are chained counting from this point
RedirectsTo.Path
Redirects or points to other path (HTTP, canonical etc.)
RedirectsTo.Code
Response code of referenced target URL (redirect or similar)
RedirectsTo.Path.Final
Final destination of redirects or points to other paths (HTTP, canonical etc.)
RedirectsTo.Code.Final
Response code of final referenced target URL (redirect or similar)
Importance.Raw
URL importance score (calculated from weighing all links across entire website)
Importance.Scaled
Importance score scaled (0-10)
HTML Errors
HTML validation warnings from W3C, Tidy or similar
CSS Errors
CSS validation warnings from W3C or similar
Spelling Errors
Spelling errors and warnings
Indexed.SearchEngine
If an URL is indexed in the selected search engine (0 = not checked, 250 = deep in results, 251 = not found, 255 = error)
Page.Search.Results
Results for page in custom text/code search
Page Title
Page title
Page Desc
Page description
Page.Keywords
page keywords
Page.H1
Page H1 content
Page.H2
Page H2 content
Title Characters
Title characters count
Title Pixels
Title pixel width
Description Characters
Description characters count
Description Pixels
Description pixel width
Content Keywords
Top page keywords and phrases in content (using weighted keyword density analysis)
Clicks
Impressions
CTR
Backlinks score
Backlinks score is shown as a value inside the range 0..10
Type or select address / URL
You can type an address here and use the "Refresh" button (or hit the "Enter" key)
Type or select "browser" user agent
Leave empty to use default. Otherwise enter another user agent ID here (some websites may return different content)
Type or select phrase / keyword
Type or select a keyword or phrase
Data filter options
Choose between different common reporting presets that shows/hides data columns and filters data
Control if URLs are shown using a "list view" or "tree view" and if URLs are percentage encoded "%"
Configure which data columns are visible
Special filtering modes that can be applied through the "quick data filter" button
Internal
External
Extended data
View file / URL
View source
W3C validate HTML
W3C validate CSS
Tidy validate
CSE validate
Details
Links [internal]
Links [external]
Linked by
Uses [internal]
Uses [external]
Used by
Redirected from
Directory summary
Response headers
Title
Save
Inside the stored data, you can change and keep the page title here
Lock
If you want this value to be persistent across recrawls use "lock" to protect this specific data
Response code ("GET" retrieves response code etc.)
Make a "GET" or "HEAD" HTTP request to shown data such as "reponse code" and similar
"GET"
Update data by making a HTTP "GET" request which retrieves HTTP headers and content
"HEAD"
Update data by making a HTTP "HEAD" request which retrieves HTTP headers
Importance score scaled
Incoming links weighted and transformed to a logarithm based 0-10 scale
Fetch
Lock
If you want this value to be persistent across recrawls use "lock" to protect this specific data
Crawler and URL state flags
Save
Inside the stored data, you can change and keep the crawler and state flags here
Content downloaded
Analysis required
Analysis started
Analysis finished
Analysis content done
Analysis references done
Info request done ("head")
Full request done ("get")
Detected "robots.txt" noindex filter
Detected "robots.txt" disallow filter
Disallow in "robots.txt" is like noindex + nofollow combined
Detected meta/header "robots noindex"
Detected meta/header "robots nofollow"
Detected meta/header "robots noarchive"
Detected meta/header "robots nosnippet"
Detected link/header "canonical"
Detected AJAX fragment in URL !#
Detected AJAX fragment in URL !# "snapshot"
Detected AJAX fragment in HTML
Detected AJAX fragment in HTML "snapshot"
Detected "do not output" filter
This covers "output" filters
Detected "do not analyze" filter
This covers "analysis" filters
Detected "directory index file"
Detected "meta refresh redirect"
Detected as video file
Detected as video image
Detected as media thumbnail
Detected as external video embed page
Detected as robots.txt file
Detected as sitemaps protocol file
Detected as an orphan URL
With the data available after last website scan - this URLs appears to not be referenced from anywhere
Detected GoogleBot in imported log file
Detected Google indexed in imported file
Google PageRank
Fetch
Estimated change frequency
Calcluation based on "importance score" and some HTTP headers
Fetch
Content mime type
Character set
Language / locale
Last modified
This checks "file last changed" for local files and server response header "last-modfied" for HTTP
Test
Save
Inside the stored data, you can change and keep the "last modified" here
Sub address (as stored)
If you want to create a folder, end the name with a slash "/"
Save
Inside the stored data, you can change and keep the relative address here
Part address
Full address (as stored)
Full address (as URL percentage encoded)
Full address (as URL percentage decoded)
Save
Inside the stored data, you can change and keep the full address here
Redirects to
|If the file resides on local disk, you can use menu "File - Update File" to save changes
No page to validate
No page to validate
Select a page or phrase to activate embedded browser
PDF to HTML conversion and parser
Click "bookmark" button. Download and extract files to disk, e.g. "c:\example\utility\tool.exe". Use this path as value.
Flash to HTML conversion and parser (e.g. Adobe "swf2html")
Click "bookmark" button. Download and extract files to disk, e.g. "c:\example\utility\tool.exe". Use this path as value.
Download OpenSSL/LibreSSL for "https" when usnig "Indy" HTTP engine. Extract files into:
Click "bookmark" button. Check you legally can use OpenSSL. Download and extract to program directory path
TIDY executable path (for HTML validation)
CSE HTML Validator *command line* executable path (for HTML/CSS validation)
Look for "cmdlineprocessor.exe" in your CSE installation directory
Enable OpenSSL for Indy HTTP communcation with "https://" URLs
Is OpenSSL cryptography (algorithms, patents, importing etc.) legal in your jurisdiction?
External tools: Google (data centers and options)
Not used by website crawler. Only used other places when requested
Decide hosts to retrieve data from, e.g. "http://www.google.com/" (if multiple, retrieved data will sometimes be averaged)
Enable usage of PageRank checking functionality
Select stop words
The values indicate relative importance. 0 means no text is extract from it.
Title text <title></title>
Header text <hx></hx>
Header <h1> weighs most. If "normal text" is 1 and "header text" is 3: H1 = 1 + 6/6 * (3-1), H6 = 1 + 1/6 * (3-1)
Anchor text <a></a>
Normal text
Image alternative text
<img alt=""> with the alternative text placed inside the pair of quotes
Tag attribute "title"
<img title=""> with the title text placed inside the pair of quotes
Meta description
Meta keywords
Words in URL
Stop words usage
End phrase markers
Removed from content
Page analysis: Result ranges
Min words in phrases
Max words in phrases
Max results per count-type
Max results for each phrase kind (e.g. 1-word type, 2-word type etc.)
Site analysis: Result ranges
Min words in phrases
Max words in phrases
Max results per count-type
Max results for each phrase kind (e.g. 1-word type, 2-word type etc.)
<#32>, (comma)
Insert a comma between all keywords
<#32>\s (space)
Insert a space between all keywords
<#32>\n (newline)
Insert a new line between all keywords
Split and show phrases with specific number of words. "*" shows all phrases with 1 to 5 words.
Limit the keywords list to a fixed number of characters. "*" = all. "#" = counts characters and allows editing.
Limit the text size to a fixed number of characters. "#" = counts characters and allows editing.
Input text here to keyword density check (used when no page / url has been selected)
Phrase
Count 0
Count %
Weight 0
Weight %
Merged *
Merged %
Lock analysis results
When pressed, keywords page data will no longer be automatically updated when "Active address" changes
Analyze keyword density count and weight
Experimental visual representation
Raw text input
Keyword list output
Tools
Analyze active address
Analyze raw text input
Special: Uses site scan data
Remember that you may have to enable certan options in "scan website - data collection" for data to be available
Sum scores for selected pages
If you for the site scan enabled "data collection - perform keyword analysis", you can sum and view scores of the selected pages
Text weight in elements
Select which file, containing search engine retrival details, you wish to use (only one)
Select which files, containing position engine retrival details, you wish to use
Select which files, containing position engine retrival details, you wish to use
Type the address of one or more websites. <SelectedSite> and <SelectedPage> automatically adds "Active address."
Type the phrases you want to position check selected sites against. <SelectedPhrase> automatically adds "Active keyword phrase".
Input comes from listed "Phrases / words".
Add or edit files with stop words
Navigate to selected address
Click to switch off/one if this should be shown in "quick" tabs
Edit the configuration of the selected "online tool"
To open the URL selected or typed into the dropdown box simply click the first right button next to it
Content keyword analysis
Analyze URLs list
Position analysis
Position checking
Position history
Keywords and lists
keyword suggestions
Analyze
Analyze search results
Stop analyze
Stop analyze
Show information
Have "pressed" to show hints and warnings before fetching results
URLs to check
Type the website URLs you want analyzed (one at each line)
Load presets from textfile
Save presets to textfile
Tools
Analyze search results
Analyze search results
Stop analyze
Stop analyze
Show information
Have "pressed" to show hints and warnings before fetching results
Show debug views
Shows raw and processed data from first "result page"
Save to history
Save results to position check history data (used for graphs etc.)
1st SERP raw
1st SERP cleaned
1st SERP minimized
Regex
Config
Phrases to check (one per line)
Type the phrases you want to position check selected sites against. <SelectedPhrase> automatically adds "Active keyword phrase".
Load presets from textfile
Save presets to textfile
Tools
Engine and depth to check
top positions
Stop check
Stop position check
Position check
Start position check
Save to history
Save results to position check history data (used for graphs etc.)
Show information
Have "pressed" to show hints and warnings before fetching results
Show debug views
Shows raw and processed data from first "result page"
Tools
Addresses to find (one per line)
Phrases to check (one per line)
Search engines
Save presets to textfile
Load presets from textfile
Save presets to textfile
Load presets from textfile
Save presets to textfile
Load presets from textfile
Multiply default delays
Add URL paths to results
Alias root domain variations
Combine keyword lists. Write each keyword phrase on a line. Separate all keyword lists with an empty line between
Enter one more more phrases. Use the underneath tools and quickly explode keyword lists
Combine to input
Combine keyword lists into output - each separated with an empty line
Spintax to input
Spin syntax (spintax) content for use in various article/content submission tools
Diff to input
Example: "a, b" diff "b, c" diff "c, d" results in "a, d" as "a" and "d" both only occur once
Clear lists
Space between words
Add space between selected keywords from each group/list
Combine keyword lists
Spintax keyword lists
Diff keyword lists
Add to input
Replace input
Clear input
Add to output
Replace output
Clear output
Input to output tools
Output tools
Permutate words
Input to output: Cover word permutations such as "tools power" (instead of "power tools")
Missing space
Input to output: Cover typo errors such as "powertools" (instead of "power tools")
Missing letter
Input to output: Cover typo errors such as "someting" (missing "h")
Switched letter
Input to output: Cover typo errors such as "somehting" ("h" and "t")
Tidy
In output: Trim for superfluous spaces
No repetition
In output: Avoid immediate repeating words in same phrase (e.g. "deluxe deluxe tools")
No duplicates
In output: Remove duplicated phrases (e.g. if two "power tools")
No permutations
In output: Remove permutated duplicate phrases (e.g. "power tools" and "tools power")
No whitespaces
In output: Remove all spaces, e.g. to fix domain name lists: "word1 word2 . com" = "word1word2.com"
Suggest related
Uses input to suggest related phrases
Cancel suggest
Show information
Have "pressed" to show hints and warnings before fetching results
Add to output
Replace output
Clear input
Add to input
Replace input
Clear output
> Delete based on negative keyword
Remove
Remove phrases that contain characters not in text
Clean
Remove characters in phrases that are not in text
> Delete based on characters
Remove
Remove phrases that contain characters not in text
Clean
Remove characters in phrases that are not in text
> Delete based on counts
Remove
Remove phrases based on character/word count
Prepend
Input to output: Copy and add all phrases with above text inserted at beginning
Append
Input to output: Copy and add all phrases with above text added to the end
" "
Input to output: Cover " " ... phrase match in Google Adwords
[ ]
Input to output: Cover [ ] ... exact match in Google Adwords
+
Input to output: Cover + ... modified broad match in Google Adwords
Tools
Input
Output
Combined analysis data
Reset data
Keyword lists
Quick examples
Combine options
Input
Output
Visuals
Tools
From/till date
Search engines
Keyword phrase
Websites
Show data and apply filters
Retrieve all data for phrase that match filters (if no phrase selected, dropdown box will fetch all available)
Update data automatically
Include data from date
Include data till date
Select search engines
Select websites
Set whether to show. Checked = show if available data. Green = has available data. Red = has no data for active phrase.
Same as the other, but switches all items instead
Set whether to show. Checked = show if available data. Green = has available data. Red = has no data for active phrase.
Same as the other, but switches all items instead
Change scale in graph
Switch between e.g. normal and logarithmic scale
Show legend in graph
Switch between e.g. show and hide legend
Show marks in graph
Switch between e.g. show and hide marks
Reset data
Build selected
Generate a sitemap of the selected type (e.g. Google XML sitemap)
Build all
Build all kinds of sitemaps supported
Quick presets...
View various configuration examples (can speed up creating new projects)
Sitemap file paths
URL options
Document options
XML sitemap options
XML sitemap extensions
HTML template code and options
Options
Code
Select the sitemap file kind to build
Use the dropdown arrow to select which kind of sitemap you want to generate
Standard XML sitemap file output path
Image XML sitemap file output path
Video XML sitemap file output path
Mobile XML sitemap file output path
News XML sitemap file output path
Code XML sitemap file output path
RSS sitemap file output path
ROR sitemap file output path
Google Base sitemap file output path
text sitemap one-URL-per-line file output path
Template sitemap HTML/CSS/custom file output path
ASP.net (controls) Web.sitemap file output path
GraphViz DOT ".gv" data file output path
Use GraphViz/DotEditor, GraphvizOnline, Cytoscape/dot-app, Gephi etc. to visualize websites. Note: Not included in "Build all"
Items as linked descriptions
Prefer <title></title>
Prefer raw paths
Prefer beautified paths
Auto detection is based on root path type
Beautified paths
Convert separators to spaces
Upcase first letter in first word
Upcase first letter in all follow words
Directories as items in own lists
Ignore
Item (normal)
Directories as headlines
Ignore
Prefer path
Prefer path (linked)
Prefer title (linked)
Prefer path + title (linked)
Set path options used in sitemap
Links use full address
Override and convert slashes used in links
Layout
Columns per page
with value above 1, links will be spread among columns
Links per page
0 means all links will be on page 1
If multiple pages in sitemap, link all at bottom
Alternative is to have "start", "prior", "selected", "next" and "end " shown
Set path root used in sitemap
Can be useful if you e.g. scanned "http://localhost" but the sitemap is for "http://www.example.com"
Add relative path to "header root link" in template sitemaps
Have e.g. "index.html" (http://example.com/index.hml) instead of "" (http://example.com/) as the "root header link" in sitemap
Character set and type
Always save sitemap files as UTF-8
Option only has influence on those sitemap types where UTF-8 is optional, e.g. HTML/template sitemaps
Save UTF-8 sitemap files with BOM
Byte-order mark option only has influence when no standard specifies if BOM is to be included or not
Generated sitemap files: Include URLs with response codes:
Generated sitemap files: Options
Remove URLs excluded by "webmaster" and "output" filters
Removes URLs excluded by "output filters", "noindex" and "robots.txt"
Sort URLs in sitemap after "importance" instead of "structure and alphabet"
Sort URLs similar to "Analyze website"
Include response code "rcNoRequest" URLs
Items with response code "rcNoRequest" / "-1" will only have text and no link in generated sitemap files
Exclude URLs that are not normal pages (e.g. images)
This option is useful if your scan settings included e.g. images, but you do not want these listed in your HTML sitemap
Control
Disable "template code" for empty directories
Useful in some cases such as avoiding <ul></ul> (which fails W3C HTML validator)
Enable root headline "template code"
If checked, the "Code : Root headline ..." will, together with the "root directory", be inserted underneath "Code : Header"
Convert from characters to entities
Convert from "&" to "&amp;", "<" to "&lt;", ">" to "&gt;" etc. in titles and URLs
Always create a sitemap index file even if not strictly necessary
Prevent "extreme" calculated <priority> and <changefreq> values
Influences the conversions done from calculated values into Google XML sitemap ranges
Override calculated values
Override Priority
Use auto "priority" calculation or set all to same value. Minus "-" removes the tag. Used as fallback for "change frequency".
Override ChangeFreq
Use auto "change frequency" calculation or set all to same value. Star "*" removes the tag.
Override LastMod with chosen date/time
Use auto "last modification" calculation or set all to same value. "Reset" sets value to "Dec 30th 1899" which removes the tag.
Reset
LastMod time zone configuration
Override the timezone part of the "LastMod" timestamps
Override with GMT timezone modifier
XML sitemap file options
Add XML necessary for validation of generated sitemap file(s)
Include "hreflang" alternate URLs in XML sitemap files (slows building, but see progress bar at bottom right)
Apply Gzip to a copy of the generated sitemap file(s)
This "gzip" copy will have name "sitemap.xml.gz" if the original name is "sitemap.xml"
Apply "minimize" to the generated sitemap files(s)
Eliminates most superfluous whitespace
XSL for XML sitemap files
XSL means "extensible stylesheet language". XSL can transform XML files so they become nicer looking in internet browsers
Max URLs in each sitemap before automatic split across multiple sitemap files.
Google news sitemaps
Publisher name
Language default
Google image sitemaps
Ensure each image is only listed once
Include images used that are externally hosted
Create "image:caption" + "image:title" values *even* if no image "alt" or "title" attributes
Google video sitemaps
Prioritize using page title/description instead of best video title/description found
Create local thumbnail images when it is an advantage to do so
Useful if your website does not contain proper video thumbnail images
Ensure each video is only listed once
Code : Header
Code : Footer start
Code : Footer navigation start
Code : Footer navigation end
Code : Footer navigation items address start
Code : Footer navigation items address end
Code : Footer navigation items title start
Code : Footer navigation items title end
Code : Footer navigation items spacer
Code : Footer end
Code : Start of headline before start of directory
Code : End of headline before start of directory
Code : Start of directory
Code : End of directory
Code : Before headline / directory combination
Code : After headline / directory combination
Code : Start of item link address start
Code : Start of item link address end
Code : Start of item link
Code : End of item link
Code : End of item link title start
Code : End of item link title end
Code : Start of headline link address start
Code : Start of headline link address end
Code : End of headline link title start
Code : End of headline link title end
Code : Column start
Code : Column end
Code : Root headline start
Code : Root headline end
Options
Add robots.txt directives
Create robots.txt
Path of robots.txt file
Create robots.txt options
Add "disallow" urls based on exclude "crawl" and "output" path filters
Add "XML sitemaps autodiscovery"
Upload last created
Upload all files associated to the last type selected and created
Upload all
Upload all files found for all types
Upload all + robots.txt
Upload all files found for all types + robots.txt
Quick presets...
View various configuration examples (can speed up creating new projects)
FTP options
Upload progress
Host and port number
Upload directory path
Protocol
If you intend to use "ftps" please make sure you have enabled OpenSSL in "General option and tools"
Connection mode
Transfer mode
Username
Password
Obfuscate FTP password between project save and load
Add common pings
View various configuration examples (can speed up creating new projects)
Ping now
Ping options
Ping progress
Addresses to ping
Some services support notifications simply by requesting a specific address
To open the file selected or typed into the text box simply click the first right button next to it
Switch between text and browser view (useful for HTML, XML, CSV etc.)
Open the file selected in the dropdown in the text editor underneath
Open the page in NotePad
Open the page in IE
Open the page in FireFox
Open the page in Opera
Open the page in Safari
Open the page in Chrome
File information
Path:
Saved with version:
Date information
Project created:
Project last modified:
Dynamic help
To open the address typed into the text box simply click the first right button next to it
Navigate embedded browser to selected address
Open the page in IE (may give a better viewing experience)
Open the page in Firefox (may give a better viewing experience)
Open the page in Opera (may give a better viewing experience)
Open the page in Safari (may give a better viewing experience)
Open the page in Chrome (may give a better viewing experience)
To open the file selected or typed into the text box simply click one of the right buttons next to it
Navigate embedded browser to selected address
Open the page in IE (may give a better viewing experience)
Open the page in Firefox (may give a better viewing experience)
Open the page in Opera (may give a better viewing experience)
Open the page in Safari (may give a better viewing experience)
Open the page in Chrome (may give a better viewing experience)
Default storage file name
If URLs have incompatible file names when being saved to disk, they are renamed "<default>0, <default>1, <default>2" etc.
Download options
Convert to relative links for browsing files offline on local disk
Convert to relative links for general usage including uploading to a webserver / website
No conversion
Auto detection is based on root path type
Convert URL paths in downloaded content
Add .html file extension to downloaded page content URLs (good for offline viewing)
Prioritize to keep downloaded URL file names on disk persistent across multiple downloads
Checking this option will also ensure downloaded URLs get to keep as much of possible of them in file names on disk
Convert paths and keep all downloaded files at root in disk
Download used images and similar residing on external domains
Tool paths
Internet crawler
Regex test tool
Paths
General options
Build now
Search engine files: Directory path on disk
Search engine files: Search page file name
Search engine config: Override website root in files
If you scan your website at "http://localhost/", you can overwrite the "root" address used in the search engine files
Page is an "index" file (i.e. the directory path is used: "example.com/search/?search_p=phrase")
Kind of search engine to build
You can initiate a search through e.g. a <form> that calls "msa1_search.html?search_p=example"
If "search" is base then parameters format is: "search_p", "search_c" etc.
Results to show per page
Presentation and data in search results
Show "scores" in search results
Show "description" in search results
HTML structure used for presenting search results
Log searches to webserver statistics
Log searches to server logs through an extra HTTP request: "msa1-logsearch.html?search_p=example"

A1 Sitemap Generator

: Create Text, HTML, RSS and Google XML sitemaps for your websites