IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Distributed Top-K Join Queries Optimizing for RDF Datasets

Distributed Top-K Join Queries Optimizing for RDF Datasets
View Sample PDF
Author(s): Jinguang Gu (College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China & Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China), Hao Dong (College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China & Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China), Zhao Liu (College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China & Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China)and Fangfang Xu (College of Computer Science and Technology, Wuhan University of Science and Technology, Wuhan, China & Hubei Province Key Laboratory of Intelligent Information Processing and Real-Time Industrial System, Wuhan, China)
Copyright: 2021
Pages: 19
Source title: Research Anthology on Architectures, Frameworks, and Integration Strategies for Distributed and Cloud Computing
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-7998-5339-8.ch092

Purchase

View Distributed Top-K Join Queries Optimizing for RDF Datasets on the publisher's website for pricing and purchasing information.

Abstract

In recent years, the scale of RDF datasets is increasing rapidly, the query research on RDF datasets in the transitional centralized environment is unable to meet the increasing demand of data query field, especially the top-k query. Based on the Spark distributed computing system and the HBase distributed storage system, a novel method is proposed for top-k query. A top–k query plan STA (Spark Threshold Algorithm) is proposed to reduce the connection operation of RDF data. Furthermore, a better algorithm SSJA (Spark Simple Join Algorithm) is presented to reduce the sorting related operations for the intermediate data. A cache mechanism is also proposed to speed up the SSJA algorithm. The experimental results show that the SSJA algorithm performs better than the STA algorithm in term of the cost and applicability, and it can significantly improve the SSJA's performance by introducing the cache mechanism.

Related Content

Sushruta Mishra, Sunil Kumar Mohapatra, Brojo Kishore Mishra, Soumya Sahoo. © 2021. 24 pages.
Carlos Santos, Helena Inácio, Rui Pedro Marques. © 2021. 16 pages.
Akash Chowdhury, Swastik Mukherjee, Sourav Banerjee. © 2021. 26 pages.
Stojan Kitanov, Toni Janevski. © 2021. 28 pages.
Ramesh C. Poonia, Linesh Raja. © 2021. 27 pages.
Jens Kohler, Thomas Specht. © 2021. 27 pages.
Jagdish Chandra Patni. © 2021. 15 pages.
Body Bottom