}}}
[[PageOutline]]
= 1. =
* A Dynamic MapReduce Scheduler for Heterogeneous Workloads
Abstract—MapReduce is an important programming model for building data centers containing ten of thousands of nodes. In a
practical data center of that scale, it is a common case that I/Obound
jobs and CPU-bound jobs, which demand different
resources, run simultaneously in the same cluster. In the
MapReduce framework, parallelization of these two kinds of job
has not been concerned. In this paper, we give a new view of the
MapReduce model, and classify the MapReduce workloads into
three categories based on their CPU and I/O utilization. With
workload classification, we design a new dynamic MapReduce
workload predict mechanism, MR-Predict, which detects the
workload type on the fly. We propose a Triple-Queue Scheduler
based on the MR-Predict mechanism. The Triple-Queue
scheduler could improve the usage of both CPU and disk I/O
resources under heterogeneous workloads. And it could improve
the Hadoop throughput by about 30% under heterogeneous
workloads.