Changes between Version 16 and Version 17 of waue/2009/nutch_install


Ignore:
Timestamp:
Apr 24, 2009, 6:47:29 PM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue/2009/nutch_install

    v16 v17  
    101101= step 3 編輯設定檔 =
    102102 * 所有的設定檔都在 /opt/nutch/conf 下
    103 == 3.1 hadoop-env.sh ==
     103== 3.1 $NUTCH_HOME/conf/hadoop-env.sh ==
    104104 * 將原本的檔案hadoop-env.sh任意處填入
     105{{{
     106$ cd /opt/nutch/conf
     107$ gedit hadoop-env.sh
     108}}}
     109
    105110{{{
    106111#!sh
     
    116121 * 載入環境設定值
    117122{{{
    118 $ source /opt/nutch/conf/hadoop-env.sh
     123$ source ./hadoop-env.sh
    119124}}}
    120125 * ps:強烈建議寫入 /etc/bash.bashrc 中比較萬無一失!!
    121126
    122127
    123 == 3.2 conf/nutch-site.xml ==
     128== 3.2 $NUTCH_HOME/conf/nutch-site.xml ==
    124129 * 重要的設定檔,新增了必要的內容於內,然而想要瞭解更多參數資訊,請見nutch-default.xml
    125130{{{
    126 $ vim conf/nutch-site.xml
     131$ gedit nutch-site.xml
    127132}}}
    128133{{{
     
    198203}}}
    199204
    200 == 3.3 crawl-urlfilter.txt ==
     205== 3.3 $NUTCH_HOME/conf/crawl-urlfilter.txt ==
    201206 * 重新編輯爬檔規則,此檔重要在於若設定不好,則爬出來的結果幾乎是空的,也就是說最後你的搜尋引擎都找不到資料啦!
    202207{{{
    203 $ vim conf/crawl-urlfilter.txt
     208$ gedit ./crawl-urlfilter.txt
    204209}}}
    205210{{{
     
    221226== 4.1 編輯url清單 ==
    222227{{{
     228$ cd /opt/nutch
    223229$ mkdir urls
    224230$ echo "http://www.nchc.org.tw" >> ./urls/urls.txt