A development of streaming big data analysis system using in-memory cluster computing framework: Spark

Kiejin Park, Changwon Baek, Limei Peng

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

11 Scopus citations

Abstract

In this paper, to deal with stream big data processing issue, we design and implement a big data analysis system using Spark which is an In-memory cluster computing framework. Spark is provided by ASF (Apache Software Foundation) open-source community, and is regarded as a next-generation high performance cluster computing technology. From the performance evaluation of the proposed system, we can see that Spark is 20+ times faster than conventional Mapreduce-based Hive SQL in terms of the response time. According to these results, we can confirm that the proposed system can be applied to solve the soft real-time big data analysis jobs for sensor data generated in a smart factory.

Original languageEnglish
Title of host publicationAdvanced Multimedia and Ubiquitous Engineering - FutureTech and MUE
EditorsHai Jin, Young-Sik Jeong, Muhammad Khurram Khan, James J. Park
PublisherSpringer Verlag
Pages157-163
Number of pages7
ISBN (Print)9789811015359
DOIs
StatePublished - 2016
Event11th International Conference on Future Information Technology, FutureTech 2016 - Beijing, China
Duration: 20 Apr 201622 Apr 2016

Publication series

NameLecture Notes in Electrical Engineering
Volume393
ISSN (Print)1876-1100
ISSN (Electronic)1876-1119

Conference

Conference11th International Conference on Future Information Technology, FutureTech 2016
Country/TerritoryChina
CityBeijing
Period20/04/1622/04/16

Keywords

  • Cloud
  • Hive SQL
  • MapReduce
  • Real-time
  • Spark

Fingerprint

Dive into the research topics of 'A development of streaming big data analysis system using in-memory cluster computing framework: Spark'. Together they form a unique fingerprint.

Cite this