urlfilter.regex.file crawl-urlfilter.txt db.ignore.internal.links false If true, when adding new links to a page, links from the same host are ignored. This is an effective way to limit the size of the link database, keeping the only the highest quality links. fetcher.server.delay 1.0 The number of seconds the fetcher will delay between successive requests to the same server. http.max.delays 1000 The number of times a thread will delay when trying to fetch a page. When using the crawl tool there are likely to be very few different hosts, so we need to be willing to wait longer for each.