TY - JOUR
T1 - Multilevel Data Processing Using Parallel Algorithms for Analyzing Big Data in High-Performance Computing
AU - Ahmad, Awais
AU - Paul, Anand
AU - Din, Sadia
AU - Rathore, M. Mazhar
AU - Choi, Gyu Sang
AU - Jeon, Gwanggil
N1 - Publisher Copyright:
© 2017, Springer Science+Business Media New York.
PY - 2018/6/1
Y1 - 2018/6/1
N2 - The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where High-Performance Computing solution has become a key issue and has attracted attention in recent years. However, these systems are either memoryless or computational inefficient. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that enhances the working of traditional MapReduce by incorporating parallel processing algorithm. Moreover, complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed parallel processing algorithm. The proposed system architecture both read and writes operations that enhance the efficiency of the Input/Output operation. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.
AB - The growing gap between users and the Big Data analytics requires innovative tools that address the challenges faced by big data volume, variety, and velocity. Therefore, it becomes computationally inefficient to analyze such massive volume of data. Moreover, advancements in the field of Big Data application and data science poses additional challenges, where High-Performance Computing solution has become a key issue and has attracted attention in recent years. However, these systems are either memoryless or computational inefficient. Therefore, keeping in view the aforementioned needs, there is a requirement for a system that can efficiently analyze a stream of Big Data within their requirements. Hence, this paper presents a system architecture that enhances the working of traditional MapReduce by incorporating parallel processing algorithm. Moreover, complete four-tier architecture is also proposed that efficiently aggregate the data, eliminate unnecessary data, and analyze the data by the proposed parallel processing algorithm. The proposed system architecture both read and writes operations that enhance the efficiency of the Input/Output operation. To check the efficiency of the proposed algorithms exploited in the proposed system architecture, we have implemented our proposed system using Hadoop and MapReduce. MapReduce is supported by a parallel algorithm that efficiently processes a huge volume of data sets. The system is implemented using MapReduce tool at the top of the Hadoop parallel nodes to generate and process graphs with near real-time. Moreover, the system is evaluated in terms of efficiency by considering the system throughput and processing time. The results show that the proposed system is more scalable and efficient.
KW - Big Data
KW - Four-tier system architecture
KW - HPC
KW - Parallel Processing algorithm
UR - http://www.scopus.com/inward/record.url?scp=85016095060&partnerID=8YFLogxK
U2 - 10.1007/s10766-017-0498-x
DO - 10.1007/s10766-017-0498-x
M3 - Article
AN - SCOPUS:85016095060
SN - 0885-7458
VL - 46
SP - 508
EP - 527
JO - International Journal of Parallel Programming
JF - International Journal of Parallel Programming
IS - 3
ER -