Changes between Version 5 and Version 6 of waue/2009/0609


Ignore:
Timestamp:
Jun 9, 2009, 4:39:34 PM (15 years ago)
Author:
waue
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • waue/2009/0609

    v5 v6  
    99 === readdb ===
    1010 - read / dump crawl db
    11  - Usage: CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>)
     11 - Usage: !CrawlDbReader <crawldb> (-stats | -dump <out_dir> | -topN <nnnn> <out_dir> [<min>] | -url <url>)
    1212 - -stats [-sort]       print overall statistics to System.out
    1313{{{
     
    2121}}}
    2222 - -dump <out_dir> [-format normal|csv ]        dump the whole db to a text file in <out_dir>
     23{{{
     24$ nutch readdb /tmp/search/crawldb/ -dump ./dump
     25$ vim ./dump/part-00000
     26}}}
    2327 - -url <url>   print information on <url> to System.out
     28{{{
     29$ nutch readdb /tmp/search/crawldb/ -url http://www.nchc.org.tw/tw/
     30URL: http://www.nchc.org.tw/tw/
     31
     32Version: 7
     33
     34Status: 6 (db_notmodified)
     35
     36Fetch time: Thu Jul 09 14:34:48 CST 2009
     37
     38Modified time: Thu Jan 01 08:00:00 CST 1970
     39
     40Retries since fetch: 0
     41
     42Retry interval: 2592000 seconds (30 days)
     43
     44Score: 3.1152809
     45
     46Signature: ce0202bbd593b09b86ce8a9aa991b321
     47
     48Metadata: _pst_: success(1), lastModified=0
     49}}}
    2450 - -topN <nnnn> <out_dir> [<min>]       dump top <nnnn> urls sorted by score to <out_dir>
    2551
    2652 === inject ===
    2753 - inject new urls into the database
    28  - Usage: Injector <crawldb> <url_dir>
     54 - Usage: !Injector <crawldb> <url_dir>
    2955
    3056 === readlinkdb ===