TY - JOUR
T1 - A Novel Data Management Scheme in Cloud for Micromachines
AU - Singh, Gurwinder
AU - Jeyaraj, Rathinaraja
AU - Sharma, Anil
AU - Paul, Anand
N1 - Publisher Copyright:
© 2023 by the authors.
PY - 2023/9
Y1 - 2023/9
N2 - In cyber-physical systems (CPS), micromachines are typically deployed across a wide range of applications, including smart industry, smart healthcare, and smart cities. Providing on-premises resources for the storage and processing of huge data collected by such CPS applications is crucial. The cloud provides scalable storage and computation resources, typically through a cluster of virtual machines (VMs) with big data tools such as Hadoop MapReduce. In such a distributed environment, job latency and makespan are highly affected by excessive non-local executions due to various heterogeneities (hardware, VM, performance, and workload level). Existing approaches handle one or more of these heterogeneities; however, they do not account for the varying performance of storage disks. In this paper, we propose a prediction-based method for placing data blocks in virtual clusters to minimize the number of non-local executions. This is accomplished by applying a linear regression algorithm to determine the performance of disk storage on each physical machine hosting a virtual cluster. This allows us to place data blocks and execute map tasks where the data blocks are located. Furthermore, map tasks are scheduled based on VM performance to reduce job latency and makespan. We simulated our ideas and compared them with the existing schedulers in the Hadoop framework. The results show that the proposed method improves MapReduce performance in terms of job latency and makespan by minimizing non-local executions compared to other methods taken for evaluation.
AB - In cyber-physical systems (CPS), micromachines are typically deployed across a wide range of applications, including smart industry, smart healthcare, and smart cities. Providing on-premises resources for the storage and processing of huge data collected by such CPS applications is crucial. The cloud provides scalable storage and computation resources, typically through a cluster of virtual machines (VMs) with big data tools such as Hadoop MapReduce. In such a distributed environment, job latency and makespan are highly affected by excessive non-local executions due to various heterogeneities (hardware, VM, performance, and workload level). Existing approaches handle one or more of these heterogeneities; however, they do not account for the varying performance of storage disks. In this paper, we propose a prediction-based method for placing data blocks in virtual clusters to minimize the number of non-local executions. This is accomplished by applying a linear regression algorithm to determine the performance of disk storage on each physical machine hosting a virtual cluster. This allows us to place data blocks and execute map tasks where the data blocks are located. Furthermore, map tasks are scheduled based on VM performance to reduce job latency and makespan. We simulated our ideas and compared them with the existing schedulers in the Hadoop framework. The results show that the proposed method improves MapReduce performance in terms of job latency and makespan by minimizing non-local executions compared to other methods taken for evaluation.
KW - MapReduce scheduling
KW - cyber-physical system
KW - data block placement
KW - data locality
UR - http://www.scopus.com/inward/record.url?scp=85172913934&partnerID=8YFLogxK
U2 - 10.3390/electronics12183807
DO - 10.3390/electronics12183807
M3 - Article
AN - SCOPUS:85172913934
SN - 2079-9292
VL - 12
JO - Electronics (Switzerland)
JF - Electronics (Switzerland)
IS - 18
M1 - 3807
ER -