Changeset 80 for nutchez-0.1/conf


Ignore:
Timestamp:
Jun 9, 2009, 6:08:55 PM (15 years ago)
Author:
waue
Message:

NutchEz ..0.1

Location:
nutchez-0.1/conf
Files:
2 edited

Legend:

Unmodified
Added
Removed
  • nutchez-0.1/conf/nutch-site.xml

    r74 r80  
    6666  </description>
    6767</property>
    68 
     68<property>
     69  <name>db.ignore.external.links</name>
     70  <value>false</value>
     71  <description>If true, outlinks leading from a page to external hosts
     72  will be ignored. This is an effective way to limit the crawl to include
     73  only initially injected hosts, without creating complex URLFilters.
     74  </description>
     75</property>
     76<property>
     77  <name>file.content.limit</name>
     78  <value>1000000</value>
     79  <description>The length limit for downloaded content, in bytes.
     80  If this value is nonnegative (>=0), content longer than it will be truncated;
     81  otherwise, no truncation at all.
     82  </description>
     83</property>
    6984</configuration>
    7085
  • nutchez-0.1/conf/sav/n.urls.txt

    r76 r80  
    1 http://www.nchc.org.tw
     1http://www.nchc.org.tw/tw/
    22http://www.hadoop.tw
Note: See TracChangeset for help on using the changeset viewer.