Note: We have a video tutorial:
Even though the video demonstration uses
TechSEO360 some of it is also applicable for users of
A1 Website Download.
Before doing anything else, you will first have to import the list of pages you want. You can do so from the
File menu.
In the newest versions, the menu item used for importing is titled
Import URLs and data from file using "smart" mode...
Select a file containing the list of URLs you wish to import. It can be in a variety of formats including
.CSV,
.SQL and
.TXT.
The software will automatically (try to) determine which URLs go into the
internal and
external tabs.
It will do so by recognizing if the majority of the imported URLs are:
- From the same domain and place those in the internal category tab. (The rest will be ignored.)
- From multiple domains and place those in the external category tab. (The rest will be ignored.)
Note: If you already have existing website data loaded, A1 Website Download will add the imported URLs if the root domain is the same.
Note: To force all imported URLs into the
external category tab, you can use
File | Import URLs and data from file into "external" list...
Crawling imported URLs belonging to a single website is straightforward.
Before starting the scan after import, select one of the
recrawl options:
- Scan website | Recrawl (full) - this will crawl new URLs found during scan.
- Scan website | Recrawl (listed only) - this will avoid including any new URLs for analysis or scan results.
You can now click the
Start scan button.
An easy way to limit the crawl of internal URLs is to use the button shown in the picture below.
This will add all selected website URLs to a
limit include to list in both
analysis filters
and
output filters.
Note: if you want to limit which URLs to include in recrawls, it is often easier to switch the left view to
list mode.
Note: If you want to have URLs checked that are not in the imported list, you will need to ensure the crawler is allowed to analyze and include them in results.
Note: Remember to keep the following options checked if you use output filters:
- Older versions: Scan website | Crawler options | Apply "webmaster" and "ouput" filters after website scan stops
- Newer versions: Scan website | Output filters | After website scan stops: Remove URLs excluded
That way, only the URLs you are interested in will be shown after the site crawl has finished.
Note: If you forget to use one of the
recrawl modes, and you use
limit crawl to filters, the scan may be unable to start if you excluded all the URLs used to initiate the site crawl from.