Changes between Version 3 and Version 4 of waue/2009/0609


Ignore:
Timestamp:
Jun 9, 2009, 4:22:47 PM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue/2009/0609

    v3 v4  
    1313
    141409/06/09 12:18:13 INFO mapred.MapTask: data buffer = 79691776/99614720
     15
    151609/06/09 12:18:13 INFO mapred.MapTask: record buffer = 262144/327680
     17
    161809/06/09 12:18:14 INFO crawl.CrawlDbReader: TOTAL urls: 1072
    171909/06/09 12:18:14 INFO crawl.CrawlDbReader: status 1 (db_unfetched):    1002
     20
    182109/06/09 12:18:14 INFO crawl.CrawlDbReader: status 2 (db_fetched):      68
     22
    1923}}}
    2024 === convdb ===
     25 - 沒啥用
     26 === inject ===
     27 
     28{{{
     29
     30}}}
     31 === readlinkdb ===
     32{{{
     33
     34}}}
    2135
    2236 ===  ===
     37{{{
    2338
    24  ===  ===
     39}}}
     40 === readseg ===
     41 - read / dump segment data
     42 - Usage: SegmentReader (-dump ... | -list ... | -get ...) [general options]
     43 - SegmentReader -dump <segment_dir> <output> [general options]
     44 - SegmentReader -list (<segment_dir1> ... | -dir <segments>) [general options]
     45 - SegmentReader -get <segment_dir> <keyValue> [general options]
     46{{{
    2547
    26  ===  ===
     48}}}
     49 === updatedb ===
     50 - update crawl db from segments after fetching
     51 - Usage: CrawlDb <crawldb> (-dir <segments> | <seg1> <seg2> ...) [-force] [-normalize] [-filter] [-noAdditions]
     52{{{
     53$ nutch updatedb /tmp/search/crawldb/ -dir /tmp/search/segments/
     54}}}
     55 === dedup ===
     56 - remove duplicates from a set of segment indexes
     57 - Usage: DeleteDuplicates <indexes> ...
     58{{{
     59$ nutch dedup /tmp/search/indexes/
    2760
    28  ===  ===
    29 
    30  ===  ===
    31 
    32  ===  ===
    33 
     61}}}
    3462 == 筆記 ==
    3563