MapReduce scheduler to minimize the size of intermediate data in shuffle phase

  • Rathinaraja Jeyaraj
  • , V. S. Ananthanarayana
  • , Anand Paul

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

4 Scopus citations

Abstract

Hadoop MapReduce is one of the cost-effective ways for processing huge data in this decade. Despite it is opensource, setting up Hadoop on-premise is not affordable for small-scale businesses and research entities. Therefore, consuming Hadoop MapReduce as a service from cloud is on increasing pace as it is scalable on-demand and based on pay-per-use model. In such multi-tenant environment, virtual bandwidth is an expensive commodity and co-located virtual machines race each other to make use of the bandwidth. A study shows that 26%-70% of MapReduce job latency is due to shuffle phase in MapReduce execution sequence. Primary expectation of a typical cloud user is to minimize the service usage cost. Allocating less bandwidth to the service costs less but increases job latency, consequently increases makespan. This trade-off is compromised by minimizing the amount of intermediate data generated in shuffle phase at application level. To achieve this, we proposed Time Sharing MapReduce Job Scheduler to minimize the amount of intermediate data; thus, service cost is cut down. As a by-product, MapReduce job latency and makespan also are improved. Result shows that our proposed model minimized the size of intermediate data upto 62.1%, when compared to the classical schedulers with combiners.

Original languageEnglish
Title of host publicationProceedings - 18th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2019
EditorsSimon Xu, Yongbin Wang, Mingyong Shi, Wenqian Shang, Jiefeng Liu, Kailong Zhang
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages30-34
Number of pages5
ISBN (Electronic)9781728108018
DOIs
StatePublished - Jun 2019
Event18th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2019 - Beijing, China
Duration: 17 Jun 201919 Jun 2019

Publication series

NameProceedings - 18th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2019

Conference

Conference18th IEEE/ACIS International Conference on Computer and Information Science, ICIS 2019
Country/TerritoryChina
CityBeijing
Period17/06/1919/06/19

Keywords

  • MapReduce scheduler
  • Shuffle phase

Fingerprint

Dive into the research topics of 'MapReduce scheduler to minimize the size of intermediate data in shuffle phase'. Together they form a unique fingerprint.

Cite this