TY - JOUR
T1 - Handling Non-Local Executions to Improve MapReduce Performance Using Ant Colony Optimization
AU - Singh, Gurwinder
AU - Sharma, Anil
AU - Jeyaraj, Rathinaraja
AU - Paul, Anand
N1 - Publisher Copyright:
© 2013 IEEE.
PY - 2021
Y1 - 2021
N2 - Improving the performance of MapReduce scheduler is a primary objective, especially in a heterogeneous virtualized cloud environment. A map task is typically assigned with an input split, which consists of one or more data blocks. When a map task is assigned to more than one data block, non-local execution is performed. In classical MapReduce scheduling schemes, data blocks are copied over the network to a node where the map task is running. This increases job latency and consumes considerable network bandwidth within and between racks in the cloud data centre. Considering this situation, we propose a methodology, 'improving data locality using ant colony optimization (IDLACO),' to minimize the number of non-local executions and virtual network bandwidth consumption when input split is assigned to more than one data block. First, IDLACO determines a set of data blocks for each map task of a MapReduce job to perform non-local executions to minimize the job latency and virtual network consumption. Then, the target virtual machine to execute map task is determined based on its heterogeneous performance. Finally, if a set of data blocks is transferred to the same node for repeated job execution, it is decided to temporarily cache them in the target virtual machine. The performance of IDLACO is analysed and compared with fair scheduler and Holistic scheduler based on the parameters, such as the number of non-local executions, average map task latency, job latency, and amount of bandwidth consumed for a MapReduce job. Results show that IDLACO significantly outperformed the classical fair scheduler and Holistic scheduler.
AB - Improving the performance of MapReduce scheduler is a primary objective, especially in a heterogeneous virtualized cloud environment. A map task is typically assigned with an input split, which consists of one or more data blocks. When a map task is assigned to more than one data block, non-local execution is performed. In classical MapReduce scheduling schemes, data blocks are copied over the network to a node where the map task is running. This increases job latency and consumes considerable network bandwidth within and between racks in the cloud data centre. Considering this situation, we propose a methodology, 'improving data locality using ant colony optimization (IDLACO),' to minimize the number of non-local executions and virtual network bandwidth consumption when input split is assigned to more than one data block. First, IDLACO determines a set of data blocks for each map task of a MapReduce job to perform non-local executions to minimize the job latency and virtual network consumption. Then, the target virtual machine to execute map task is determined based on its heterogeneous performance. Finally, if a set of data blocks is transferred to the same node for repeated job execution, it is decided to temporarily cache them in the target virtual machine. The performance of IDLACO is analysed and compared with fair scheduler and Holistic scheduler based on the parameters, such as the number of non-local executions, average map task latency, job latency, and amount of bandwidth consumed for a MapReduce job. Results show that IDLACO significantly outperformed the classical fair scheduler and Holistic scheduler.
KW - Ant colony optimization
KW - MapReduce scheduler
KW - cloud computing
KW - heterogeneous performance
KW - virtualized environment
UR - https://www.scopus.com/pages/publications/85110701283
U2 - 10.1109/ACCESS.2021.3091675
DO - 10.1109/ACCESS.2021.3091675
M3 - Article
AN - SCOPUS:85110701283
SN - 2169-3536
VL - 9
SP - 96176
EP - 96188
JO - IEEE Access
JF - IEEE Access
M1 - 9462852
ER -