12 | | == A Dynamic MapReduce Scheduler for Heterogeneous Workloads == |
| 12 | * A Dynamic MapReduce Scheduler for Heterogeneous Workloads |
| 13 | |
| 14 | Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a |
| 15 | practical data center of that scale, it is a common case that I/Obound |
| 16 | jobs and CPU-bound jobs, which demand different |
| 17 | resources, run simultaneously in the same cluster. In the |
| 18 | MapReduce framework, parallelization of these two kinds of job |
| 19 | has not been concerned. In this paper, we give a new view of the |
| 20 | MapReduce model, and classify the MapReduce workloads into |
| 21 | three categories based on their CPU and I/O utilization. With |
| 22 | workload classification, we design a new dynamic MapReduce |
| 23 | workload predict mechanism, MR-Predict, which detects the |
| 24 | workload type on the fly. We propose a Triple-Queue Scheduler |
| 25 | based on the MR-Predict mechanism. The Triple-Queue |
| 26 | scheduler could improve the usage of both CPU and disk I/O |
| 27 | resources under heterogeneous workloads. And it could improve |
| 28 | the Hadoop throughput by about 30% under heterogeneous |
| 29 | workloads. |
| 30 | |