urlfilter.regex.file
crawl-urlfilter.txt
db.ignore.internal.links
false
If true, when adding new links to a page, links from
the same host are ignored. This is an effective way to limit the
size of the link database, keeping the only the highest quality
links.
fetcher.server.delay
1.0
The number of seconds the fetcher will delay between
successive requests to the same server.
http.max.delays
1000
The number of times a thread will delay when trying to
fetch a page. When using the crawl tool there are likely to be very
few different hosts, so we need to be willing to wait longer for
each.