Changes between Version 8 and Version 9 of waue


Ignore:
Timestamp:
May 16, 2008, 7:21:52 PM (16 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue

    v8 v9  
    66 * 檢查cps是否有錯誤
    77 == 工作里程 ==
     8 === 5/16 ===
     9 * 感謝sunni指點迷津,nutch 成功build in nutch
     10    1. File ==> new ==> Project ==> java project ==> Next ==> Project name (設成 nutch0.9) ==> Contents ==> Create project from existing(選擇存放nutch路徑) ==> Finish.
     11    2. 此時會出現366個error , 即使用網路上得除錯方法:將兩個jar( [http://nutch.cvs.sourceforge.net/*checkout*/nutch/nutch/src/plugin/parse-mp3/lib/jid3lib-0.5.1.jar jid3lib-0.5.1.jar] 和 [http://nutch.cvs.sourceforge.net/*checkout*/nutch/nutch/src/plugin/parse-rtf/lib/rtf-parser.jar rtf-parser.jar] )  放入nutch-0.9的lib文件夾下。在Eelipse中右鍵點擊 nutch0.9 ==> properties.. ==> Java Build Path ==> Librarles ==> Add External JARs... ==> 點選剛下載的兩個jar ==>ok
     12    3.  但此時還是有一堆錯誤,解決的方法是 Eelipse中右鍵點擊 nutch0.9 ==> properties.. ==> Java Build Path ==> Source ==>將資料夾圖示的都刪掉,僅加入nutch/conf
     13    4. 此時會看到所有的錯誤都解除,接著修改 nutch/conf 內的 nutch-site.xml 、 crawl-urlfilter.txt、hadoop.site.xml、hodoop.env.sh,並在nutch/ 下加入 urls/urls.txt,並將要掃描的網址寫入urls.txt
     14    5.  Menu Run > "Run..." ==> create "New" for "Java Application"
     15       * set in Main class =  org.apache.nutch.crawl.Crawl
     16       * on tab Arguments:
     17          * Program Arguments = urls -dir crawl -depth 3 -topN 50
     18          * in VM arguments: -Dhadoop.log.dir=logs -Dhadoop.log.file=hadoop.log
     19       * click on "Run"
     20
     21
    822 === 5/15 ===
    923 * building nutch in eclipse