| 1 | = nutch 測試成功小計 = |
| 2 | I. 環境 ubuntu 7.10 + tomcat 5.5 + jdk 1.5 + nutch 0.9 主機 (hd01~hd04).nchc.org.tw |
| 3 | II. 目錄環境 :: |
| 4 | * /nutch -[[br]] |
| 5 | * -/filesystem[[br]] |
| 6 | * -/search[[br]] |
| 7 | * -/local[[br]] |
| 8 | III. 修改 |
| 9 | * /nutch/search/conf/[wiki:nutch.site.xml.nutch nutch.site.xml] |
| 10 | * /nutch/search/conf/[wiki:hadoop.site.xml] |
| 11 | * /nutch/search/conf/[wiki:hadoop.env.sh] |
| 12 | * /nutch/search/conf/[wiki:crawl-urlfilter.txt] |
| 13 | * /nutch/search/conf/[wiki:slaves] |
| 14 | * /nutch/search/urls/urls.txt [wiki:urls.txt] |
| 15 | IV. 指令需按照順序先後 (在/nutch/search/ 目錄下) |
| 16 | 1. scp -r /nutch/* hd02:/nutch/ /* 重複hd03~hd04 */ |
| 17 | 2. bin/start-all.sh |
| 18 | 3. bin/hadoop -put urls urls |
| 19 | 4. bin/nutch crawl urls -dir crawl01 -depth 3 >& logs/crawl01.log |
| 20 | 5. bin/hadoop dfs -copyToLocal crawl01 /nutch/local/ |
| 21 | V. Tomcat設定 |
| 22 | * 安裝Tomcat到hd01(只要一台即可) : |
| 23 | * tar -zxvf apache-tomcate5.5.tar.gz -C /nutch/ |
| 24 | * mv /nutch/apache-tomcat5.5 /nutch/tomcat |
| 25 | * 解開nutch網頁套件到/nutch/tomcat/webapps/ROOT/中 |
| 26 | * jar -xvf nutch-0.9.war |
| 27 | * 修改Tomcat 設定 |
| 28 | * /nutch/tomcat/webapps/ROOT/WEB-INF/classes/nutch.site.xml [wiki:nutch.site.xml.tomcat 不同於/nutch/search/conf] |
| 29 | * /nutch/tomcat/conf/[wiki:server.xml] |
| 30 | * 瀏覽網頁 hd01.nchc.org.tw:8080 |