Changes between Version 1 and Version 2 of jazz/12-01-19
- Timestamp:
- Jan 19, 2012, 9:47:15 AM (13 years ago)
Legend:
- Unmodified
- Added
- Removed
- Modified
-
jazz/12-01-19
v1 v2 5 5 * [http://info.cloudera.com/2012117NewsletterJanuary2012JAN_20120117Newsletter-January-2012JAN.html How many reducers should I use for my job?] - Cloudera 建議設定 reducer 個數在 0.95 ~ 1.75 * (節點數 * mapred.tasktracker.tasks.maximum 參數設定值) 6 6 {{{ 7 The answer to this depends on the job, of course, but in general, if too many small files are created, this will cause more time spent in I/O. A lower number of reducers will create fewer, but larger, output files. A good rule of thumb is to tune the number of reducers so that the output files are at least a half block size. 7 The answer to this depends on the job, of course, but in general, if too many small files are created, 8 this will cause more time spent in I/O. A lower number of reducers will create fewer, but larger, output 9 files. A good rule of thumb is to tune the number of reducers so that the output files are at least a 10 half block size. 8 11 9 The right number of reducers seems to be between 0.95 or 1.75 multiplied by (nodes * mapred.tasktracker.tasks.maximum). At a multiplier of 0.95 all of the reducers can launch immediately and start transferring map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing. 12 The right number of reducers seems to be between 0.95 or 1.75 multiplied by 13 (nodes * mapred.tasktracker.tasks.maximum). At a multiplier of 0.95 all of the reducers can launch 14 immediately and start transferring map outputs as the maps finish. At 1.75 the faster nodes will finish 15 their first round of reduces and launch a second round of reduces doing a much better job of load balancing. 10 16 }}}