= Nutch 研究與隨筆 = == 前言 == - 目前開發NutchEz 已經可以運作了,但都是基本功能,也找出某些問題 - 希望在完整的看完Nutch的官方網頁後,得到更好的靈感與改進方式 == 更多指令 == === readdb === {{{ $ nutch readdb /tmp/search/crawldb -stats 09/06/09 12:18:13 INFO mapred.MapTask: data buffer = 79691776/99614720 09/06/09 12:18:13 INFO mapred.MapTask: record buffer = 262144/327680 09/06/09 12:18:14 INFO crawl.CrawlDbReader: TOTAL urls: 1072 09/06/09 12:18:14 INFO crawl.CrawlDbReader: status 1 (db_unfetched): 1002 09/06/09 12:18:14 INFO crawl.CrawlDbReader: status 2 (db_fetched): 68 }}} === convdb === - 沒啥用 === inject === {{{ }}} === readlinkdb === {{{ }}} === === {{{ }}} === readseg === - read / dump segment data - Usage: SegmentReader (-dump ... | -list ... | -get ...) [general options] - SegmentReader -dump [general options] - SegmentReader -list ( ... | -dir ) [general options] - SegmentReader -get [general options] {{{ }}} === updatedb === - update crawl db from segments after fetching - Usage: CrawlDb (-dir | ...) [-force] [-normalize] [-filter] [-noAdditions] {{{ $ nutch updatedb /tmp/search/crawldb/ -dir /tmp/search/segments/ }}} === dedup === - remove duplicates from a set of segment indexes - Usage: DeleteDuplicates ... {{{ $ nutch dedup /tmp/search/indexes/ }}} == 筆記 ==