Version 8 (modified by shunfa, 15 years ago) (diff) |
---|
NutchEZ Install測試
步驟
- 將安裝shell檔及*.tar.gz放置同一目錄下
- 執行install.sh
安裝之後檢查項目
路徑 檢查項目 /home/nutchuser/nutchez/source client安裝檔(檢查ip,hostname), client壓縮檔 /etc/hosts 相同的hostsname需註解掉
測試
Ubuntu10.04
- Java 檢查部份可加入以下訊息提醒user除錯步驟
add-apt-repository "deb http://archive.canonical.com/ lucid partner" apt-get update apt-get install sun-java6-jdk sun-java6-plugin update-java-alternatives -s java-6-sun
Ubuntu9.10
- Java 檢查部份可加入以下訊息提醒user除錯步驟
apt-get install sun-java6-jdk sun-java6-plugin
執行
上傳urls
- bin/hadoop dfs -put urls urls
log4j:ERROR setFile(null,true) call failed. java.io.FileNotFoundException: /tmp/NutchEZ/logs/hadoop.log (Permission denied) ...something message... log4j:ERROR Either File or DatePattern options are not set for appender [DRFA]. put: org.apache.hadoop.security.AccessControlException: Permission denied: user=nutchuser, access=WRITE, inode="":root:supergroup:rwxr-xr-x
- 暫時切換至root測試
爬網
- bin/nutch crawl urls -dir search -threads 2 -depth 3 -topN 100000
待完成事項
- 爬網, 搜尋檔案..等執行階段測試