Changes between Version 1 and Version 2 of jazz/12-01-19


Ignore:
Timestamp:
Jan 19, 2012, 9:47:15 AM (13 years ago)
Author:
jazz
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • jazz/12-01-19

    v1 v2  
    55 * [http://info.cloudera.com/2012117NewsletterJanuary2012JAN_20120117Newsletter-January-2012JAN.html How many reducers should I use for my job?] - Cloudera 建議設定 reducer 個數在 0.95 ~ 1.75 * (節點數 * mapred.tasktracker.tasks.maximum 參數設定值)
    66{{{
    7 The answer to this depends on the job, of course, but in general, if too many small files are created, this will cause more time spent in I/O. A lower number of reducers will create fewer, but larger, output files. A good rule of thumb is to tune the number of reducers so that the output files are at least a half block size.
     7The answer to this depends on the job, of course, but in general, if too many small files are created,
     8this will cause more time spent in I/O. A lower number of reducers will create fewer, but larger, output
     9files. A good rule of thumb is to tune the number of reducers so that the output files are at least a
     10half block size.
    811
    9 The right number of reducers seems to be between 0.95 or 1.75 multiplied by (nodes * mapred.tasktracker.tasks.maximum). At a multiplier of 0.95 all of the reducers can launch immediately and start transferring map outputs as the maps finish. At 1.75 the faster nodes will finish their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
     12The right number of reducers seems to be between 0.95 or 1.75 multiplied by
     13(nodes * mapred.tasktracker.tasks.maximum). At a multiplier of 0.95 all of the reducers can launch
     14immediately and start transferring map outputs as the maps finish. At 1.75 the faster nodes will finish
     15their first round of reduces and launch a second round of reduces doing a much better job of load balancing.
    1016}}}