http.agent.name user HTTP 'User-Agent' request header. http.agent.description MyTest Further description http.agent.url localhost A URL to advertise in the User-Agent header. http.agent.email you@yous An email address plugin.includes protocol-(http|httpclient)|urlfilter-regex|parse-(text|html|js|ext|msexcel|mspowerpoint|msword|oo|pdf|rss|swf|zip)|index-(more|basic|anchor)|query-(more|basic|site|url)|response-(json|xml)|summary-basic|scoring-opic|urlnormalizer-(pass|regex|basic) Regular expression naming plugin directory names parse.plugin.file parse-plugins.xml The name of the file that defines the associations between content-types and parsers. db.max.outlinks.per.page -1 http.content.limit -1 indexer.mergeFactor 500 The factor that determines the frequency of Lucene segment merges. This must not be less than 2, higher values increase indexing speed but lead to increased RAM usage, and increase the number of open file handles (which may lead to "Too many open files" errors). NOTE: the "segments" here have nothing to do with Nutch segments, they are a low-level data unit used by Lucene. indexer.minMergeDocs 500 This number determines the minimum number of Lucene Documents buffered in memory between Lucene segment merges. Larger values increase indexing speed and increase RAM usage.