Changes between Version 25 and Version 26 of jazz/08-11-05


Ignore:
Timestamp:
Nov 5, 2008, 2:28:12 PM (16 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/08-11-05

    v25 v26  
    4949   * Hadoop 會將輸入切成很多塊 !InputSplit, 但是可能會遇到要處理的資料在另一塊 !InputSplit 的困擾
    5050 * Reducer 個數建議為 0.95 * num_nodes * mapred.tasktracker.tasks.maximum 這裡的 0.95 是為了預留 5% 的時間來處理其他 node 故障所造成的影響。
     51 * hadoop-stream:
     52   * 目前處理 binary 的能力仍有限,因此建議使用在純文字處理上。
     53   *
     54{{{
     55~/hadoop-0.18.2$ echo "sed -e "s/ /\n/g" | grep ." > streamingMapper.sh
     56~/hadoop-0.18.2$ echo "uniq -c | awk '{print $2 \"\t\" $1}'" > streamingReducer.sh
     57~/hadoop-0.18.2$ bin/hadoop jar ./contrib/streaming/hadoop-0.18.2-streaming.jar -input conf -output out1 -mapper streamingMapper.sh -reducer streamingReducer.sh -file ./streamingMapper.sh -file ./streamingReducer.sh
     58}}}
    5159 * [http://www.hadoop.tw/2008/09/php-hadoop.html 用 "單機" 跟 "PHP" 開發 Hadoop 程式]
     60 * Todo:
     61   * get "Hadoop Cookbook"
    5262
    5363== Deploy Hadoop with DRBL ==