Version 6 (modified by shunfa, 15 years ago) (diff) |
---|
NutchEZ Install測試
步驟
- 將apache-tomcat-6.0.18.tar.gz, nutch-1.0.tar.gz放置/opt/目錄下
- 執行install.sh即可
安裝測試結果(虛擬電腦與166機器比較)
nutch diff hadoop-env.sh done nutch diff hadoop-site.xml done nutch diff nutch-site.xml * nutch diff slaves client_install修改此檔 nutch diff crawl-urlfilter.txt done tomcat diff server.xml done tomcat diff nutch-site.xml *
- 若讓使用者自行輸入dns, 可能會因輸入錯誤造成namenode or 其他服務無法正常啟動
執行
上傳urls
- bin/hadoop dfs -put urls urls
log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /tmp/NutchEZ/logs/hadoop.log (Permission denied) ...something message... log4j:ERROR Either File or DatePattern options are not set for appender [DRFA]. put: org.apache.hadoop.security.AccessControlException: Permission denied: user=nutchuser, access=WRITE, inode="":root:supergroup:rwxr-xr-x
- 暫時切換至root測試
爬網
- bin/nutch crawl urls -dir search -threads 2 -depth 3 -topN 100000
待完成事項
- 爬網, 搜尋檔案..等執行階段測試