Changes between Version 3 and Version 4 of waue/2009/0609
- Timestamp:
- Jun 9, 2009, 4:22:47 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
waue/2009/0609
v3 v4 13 13 14 14 09/06/09 12:18:13 INFO mapred.MapTask: data buffer = 79691776/99614720 15 15 16 09/06/09 12:18:13 INFO mapred.MapTask: record buffer = 262144/327680 17 16 18 09/06/09 12:18:14 INFO crawl.CrawlDbReader: TOTAL urls: 1072 17 19 09/06/09 12:18:14 INFO crawl.CrawlDbReader: status 1 (db_unfetched): 1002 20 18 21 09/06/09 12:18:14 INFO crawl.CrawlDbReader: status 2 (db_fetched): 68 22 19 23 }}} 20 24 === convdb === 25 - 沒啥用 26 === inject === 27 28 {{{ 29 30 }}} 31 === readlinkdb === 32 {{{ 33 34 }}} 21 35 22 36 === === 37 {{{ 23 38 24 === === 39 }}} 40 === readseg === 41 - read / dump segment data 42 - Usage: SegmentReader (-dump ... | -list ... | -get ...) [general options] 43 - SegmentReader -dump <segment_dir> <output> [general options] 44 - SegmentReader -list (<segment_dir1> ... | -dir <segments>) [general options] 45 - SegmentReader -get <segment_dir> <keyValue> [general options] 46 {{{ 25 47 26 === === 48 }}} 49 === updatedb === 50 - update crawl db from segments after fetching 51 - Usage: CrawlDb <crawldb> (-dir <segments> | <seg1> <seg2> ...) [-force] [-normalize] [-filter] [-noAdditions] 52 {{{ 53 $ nutch updatedb /tmp/search/crawldb/ -dir /tmp/search/segments/ 54 }}} 55 === dedup === 56 - remove duplicates from a set of segment indexes 57 - Usage: DeleteDuplicates <indexes> ... 58 {{{ 59 $ nutch dedup /tmp/search/indexes/ 27 60 28 === === 29 30 === === 31 32 === === 33 61 }}} 34 62 == 筆記 == 35 63