A design of high-speed big data query processing system for social data analysis: Using spark SQL

Kiejin Park, Limei Peng

Research output: Contribution to journalArticlepeer-review

4 Scopus citations

Abstract

Social network service generates formal, semi-formal and informal data continuously and such social data have complicated and diverse features. In these days massive social data are created in a real-time manner and the existing query processing system shows short in handling these massive data. Especially, it is very difficult to analyze such kind of social data in a real-time way. In this paper, to achieve high-speed response and analyses for social big data, distributed in-memory based Spark SQL is used to construct a high-speed query processing system for social data. In Spark SQL, social data are loaded to high-speed cluster memory instead of low-speed hard disks, and thus, can increase the query performance drastically. Moreover, we use a different data processing approach based on column-oriented Spark SQL data frame rather than existing row-oriented record unit. By doing this for informal social data, high-speed query processing is possible and through evaluating the performance of the proposed query processing system, the amount of social data that are processed in a high-speed way reaches up to Terabytes(TBs) and the query processing performance (i. e., processing time) exhibits a linear pattern along with the volume of social big data.

Original languageEnglish
Pages (from-to)8221-8225
Number of pages5
JournalInternational Journal of Applied Engineering Research
Volume11
Issue number14
StatePublished - 2016

Keywords

  • Hadoop
  • Query processing
  • Social data
  • Spark
  • Spark SQL

Fingerprint

Dive into the research topics of 'A design of high-speed big data query processing system for social data analysis: Using spark SQL'. Together they form a unique fingerprint.

Cite this