wiki:shunfa/2010/0524_NutchEZ_InstallTest

Version 6 (modified by shunfa, 14 years ago) (diff)

--

NutchEZ Install測試

步驟

  • 將apache-tomcat-6.0.18.tar.gz, nutch-1.0.tar.gz放置/opt/目錄下
  • 執行install.sh即可

安裝測試結果(虛擬電腦與166機器比較)

nutchdiff hadoop-env.shdone
nutchdiff hadoop-site.xmldone
nutchdiff nutch-site.xml*
nutchdiff slavesclient_install修改此檔
nutchdiff crawl-urlfilter.txtdone
tomcatdiff server.xmldone
tomcatdiff nutch-site.xml*
  • 若讓使用者自行輸入dns, 可能會因輸入錯誤造成namenode or 其他服務無法正常啟動

執行

上傳urls

  • bin/hadoop dfs -put urls urls
    log4j:ERROR setFile(null,true) call failed.
    java.io.FileNotFoundException: /tmp/NutchEZ/logs/hadoop.log (Permission denied)
    ...something message...
    log4j:ERROR Either File or DatePattern options are not set for appender [DRFA].
    put: org.apache.hadoop.security.AccessControlException: Permission denied: user=nutchuser, access=WRITE, inode="":root:supergroup:rwxr-xr-x
    
  • 暫時切換至root測試

爬網

  • bin/nutch crawl urls -dir search -threads 2 -depth 3 -topN 100000

待完成事項

  • 爬網, 搜尋檔案..等執行階段測試