Changes between Version 25 and Version 26 of jazz/08-11-05
- Timestamp:
- Nov 5, 2008, 2:28:12 PM (16 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
TabularUnified jazz/08-11-05
v25 v26 49 49 * Hadoop 會將輸入切成很多塊 !InputSplit, 但是可能會遇到要處理的資料在另一塊 !InputSplit 的困擾 50 50 * Reducer 個數建議為 0.95 * num_nodes * mapred.tasktracker.tasks.maximum 這裡的 0.95 是為了預留 5% 的時間來處理其他 node 故障所造成的影響。 51 * hadoop-stream: 52 * 目前處理 binary 的能力仍有限,因此建議使用在純文字處理上。 53 * 54 {{{ 55 ~/hadoop-0.18.2$ echo "sed -e "s/ /\n/g" | grep ." > streamingMapper.sh 56 ~/hadoop-0.18.2$ echo "uniq -c | awk '{print $2 \"\t\" $1}'" > streamingReducer.sh 57 ~/hadoop-0.18.2$ bin/hadoop jar ./contrib/streaming/hadoop-0.18.2-streaming.jar -input conf -output out1 -mapper streamingMapper.sh -reducer streamingReducer.sh -file ./streamingMapper.sh -file ./streamingReducer.sh 58 }}} 51 59 * [http://www.hadoop.tw/2008/09/php-hadoop.html 用 "單機" 跟 "PHP" 開發 Hadoop 程式] 60 * Todo: 61 * get "Hadoop Cookbook" 52 62 53 63 == Deploy Hadoop with DRBL ==