Abstract
Social network service generates formal, semi-formal and informal data continuously and such social data have complicated and diverse features. In these days massive social data are created in a real-time manner and the existing query processing system shows short in handling these massive data. Especially, it is very difficult to analyze such kind of social data in a real-time way. In this paper, to achieve high-speed response and analyses for social big data, distributed in-memory based Spark SQL is used to construct a high-speed query processing system for social data. In Spark SQL, social data are loaded to high-speed cluster memory instead of low-speed hard disks, and thus, can increase the query performance drastically. Moreover, we use a different data processing approach based on column-oriented Spark SQL data frame rather than existing row-oriented record unit. By doing this for informal social data, high-speed query processing is possible and through evaluating the performance of the proposed query processing system, the amount of social data that are processed in a high-speed way reaches up to Terabytes(TBs) and the query processing performance (i. e., processing time) exhibits a linear pattern along with the volume of social big data.
Original language | English |
---|---|
Pages (from-to) | 8221-8225 |
Number of pages | 5 |
Journal | International Journal of Applied Engineering Research |
Volume | 11 |
Issue number | 14 |
State | Published - 2016 |
Keywords
- Hadoop
- Query processing
- Social data
- Spark
- Spark SQL