wiki:jazz/12-01-19

2012-01-19

Hadoop

  • How many reducers should I use for my job? - Cloudera 建議設定 reducer 個數在 0.95 ~ 1.75 * (節點數 * mapred.tasktracker.tasks.maximum 參數設定值)
    The answer to this depends on the job, of course, but in general, if too many small files are created, 
    this will cause more time spent in I/O. A lower number of reducers will create fewer, but larger, output 
    files. A good rule of thumb is to tune the number of reducers so that the output files are at least a 
    half block size.
    
    The right number of reducers seems to be between 0.95 or 1.75 multiplied by 
    (nodes * mapred.tasktracker.tasks.maximum). At a multiplier of 0.95 all of the reducers can launch 
    immediately and start transferring map outputs as the maps finish. At 1.75 the faster nodes will finish 
    their first round of reduces and launch a second round of reduces doing a much better job of load balancing. 
    
Last modified 12 years ago Last modified on Jan 19, 2012, 9:47:15 AM