Changeset 80 for nutchez-0.1/conf
- Timestamp:
- Jun 9, 2009, 6:08:55 PM (15 years ago)
- Location:
- nutchez-0.1/conf
- Files:
-
- 2 edited
Legend:
- Unmodified
- Added
- Removed
-
nutchez-0.1/conf/nutch-site.xml
r74 r80 66 66 </description> 67 67 </property> 68 68 <property> 69 <name>db.ignore.external.links</name> 70 <value>false</value> 71 <description>If true, outlinks leading from a page to external hosts 72 will be ignored. This is an effective way to limit the crawl to include 73 only initially injected hosts, without creating complex URLFilters. 74 </description> 75 </property> 76 <property> 77 <name>file.content.limit</name> 78 <value>1000000</value> 79 <description>The length limit for downloaded content, in bytes. 80 If this value is nonnegative (>=0), content longer than it will be truncated; 81 otherwise, no truncation at all. 82 </description> 83 </property> 69 84 </configuration> 70 85 -
nutchez-0.1/conf/sav/n.urls.txt
r76 r80 1 http://www.nchc.org.tw 1 http://www.nchc.org.tw/tw/ 2 2 http://www.hadoop.tw
Note: See TracChangeset
for help on using the changeset viewer.