TY - JOUR
T1 - Improving MapReduce scheduler for heterogeneous workloads in a heterogeneous environment
AU - Jeyaraj, Rathinaraja
AU - Ananthanarayana, V. S.
AU - Paul, Anand
N1 - Publisher Copyright:
© 2019 John Wiley & Sons, Ltd.
PY - 2020/4/10
Y1 - 2020/4/10
N2 - Big data is largely influencing business entities and research sectors to be more data-driven. Hadoop MapReduce is one of the cost-effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on-demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic-based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2-dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic-based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
AB - Big data is largely influencing business entities and research sectors to be more data-driven. Hadoop MapReduce is one of the cost-effective ways to process large scale datasets and offered as a service over the Internet. Even though cloud service providers promise an infinite amount of resources available on-demand, it is inevitable that some of the hired virtual resources for MapReduce are left unutilized and makespan is limited due to various heterogeneities that exist while offering MapReduce as a service. As MapReduce v2 allows users to define the size of containers for the map and reduce tasks, jobs in a batch become heterogeneous and behave differently. Also, the different capacity of virtual machines in the MapReduce virtual cluster accommodate a varying number of map/reduce tasks. These factors highly affect resource utilization in the virtual cluster and the makespan for a batch of MapReduce jobs. Default MapReduce job schedulers do not consider these heterogeneities that exist in a cloud environment. Moreover, virtual machines in MapReduce virtual cluster process an equal number of blocks regardless of their capacity, which affects the makespan. Therefore, we devised a heuristic-based MapReduce job scheduler that exploits virtual machine and MapReduce workload level heterogeneities to improve resource utilization and makespan. We proposed two methods to achieve this: (i) roulette wheel scheme based data block placement in heterogeneous virtual machines, and (ii) a constrained 2-dimensional bin packing to place heterogeneous map/reduce tasks. We compared heuristic-based MapReduce job scheduler against the classical fair scheduler in MapReduce v2. Experimental results showed that our proposed scheduler improved makespan and resource utilization by 45.6% and 47.9% over classical fair scheduler.
KW - bin packing
KW - heterogeneous workloads/jobs
KW - map/reduce task placement
UR - http://www.scopus.com/inward/record.url?scp=85075211192&partnerID=8YFLogxK
U2 - 10.1002/cpe.5558
DO - 10.1002/cpe.5558
M3 - Article
AN - SCOPUS:85075211192
SN - 1532-0626
VL - 32
JO - Concurrency Computation Practice and Experience
JF - Concurrency Computation Practice and Experience
IS - 7
M1 - e5558
ER -