Changeset 80


Ignore:
Timestamp:
Jun 9, 2009, 6:08:55 PM (15 years ago)
Author:
waue
Message:

NutchEz ..0.1

Files:
2 added
5 edited

Legend:

Unmodified
Added
Removed
  • nutchez-0.1/bin/nutchez-func.sh

    r77 r80  
    169169  echo_vb "/opt/nutch/bin/nutch crawl ~/.nutchez/urls -dir ~/.nutchez/search -depth $DEPTH"
    170170  echo_vb "nutch conf dir = $NUTCH_CONF_DIR"
    171   /opt/nutch/bin/nutch crawl ~/.nutchez/urls -dir ~/.nutchez/search -depth $DEPTH
     171  /opt/nutch/bin/nutch crawl ~/.nutchez/urls -dir ~/.nutchez/search -depth $DEPTH -topN 5000 -threads 1000
    172172}
    173173
  • nutchez-0.1/conf/nutch-site.xml

    r74 r80  
    6666  </description>
    6767</property>
    68 
     68<property>
     69  <name>db.ignore.external.links</name>
     70  <value>false</value>
     71  <description>If true, outlinks leading from a page to external hosts
     72  will be ignored. This is an effective way to limit the crawl to include
     73  only initially injected hosts, without creating complex URLFilters.
     74  </description>
     75</property>
     76<property>
     77  <name>file.content.limit</name>
     78  <value>1000000</value>
     79  <description>The length limit for downloaded content, in bytes.
     80  If this value is nonnegative (>=0), content longer than it will be truncated;
     81  otherwise, no truncation at all.
     82  </description>
     83</property>
    6984</configuration>
    7085
  • nutchez-0.1/conf/sav/n.urls.txt

    r76 r80  
    1 http://www.nchc.org.tw
     1http://www.nchc.org.tw/tw/
    22http://www.hadoop.tw
  • nutchez-0.1/debian/nutchez.install

    r73 r80  
    11conf/*    etc/nutch
    22bin   opt/nutch
    3 bin/nutchez*  usr/local/sbin
     3bin/nutchez*  usr/local/bin
    44lib   opt/nutch
    55webapps   opt/nutch
  • nutchez-0.1/debian/nutchez.postrm

    r75 r80  
    1414  if [ -d $i ];then
    1515    echo "delete this dir :  $i"
    16     rm -ir $i
     16    rm -r $i
    1717  fi
    1818done
Note: See TracChangeset for help on using the changeset viewer.