The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Nesting Strategies for Enabling Nimble MapReduce Dataflows for Large RDF Data
Abstract
Graph and semi-structured data are usually modeled in relational processing frameworks as “thin” relations (node, edge, node) and processing such data involves a lot of join operations. Intermediate results of joins with multi-valued attributes or relationships, contain redundant subtuples due to repetition of single-valued attributes. The amount of redundant content is high for real-world multi-valued relationships in social network (millions of Twitter followers of popular celebrities) or biological (multiple references to related proteins) datasets. In MapReduce-based platforms such as Apache Hive and Pig, redundancy in intermediate results contributes avoidable costs to the overall I/O, sorting, and network transfer overhead of join-intensive workloads due to longer workflows. Consequently, providing techniques for dealing with such redundancy will enable more nimble execution of such workflows. This paper argues for the use of a nested data model for representing intermediate data concisely using nesting-aware dataflow operators that allow for lazy and partial unnesting strategies. This approach reduces the overall I/O and network footprint of a workflow by concisely representing intermediate results during most of a workflow's execution, until complete unnesting is absolutely necessary. The proposed strategies are integrated into Apache Pig and experimental evaluation over real-world and synthetic benchmark datasets confirms their superiority over relational-style MapReduce systems such as Apache Pig and Hive.
Related Content
Hrithik Raj, Ritu Punhani, Ishika Punhani.
© 2023.
31 pages.
|
Divi Anand, Isha Kaushik, Jasmehar Singh Mann, Ritu Punhani, Ishika Punhani.
© 2023.
21 pages.
|
Jayanthi G., Purushothaman R..
© 2023.
10 pages.
|
Anshika Gupta, Shuchi Sirpal.
© 2023.
14 pages.
|
Reet Kaur Kohli, Seneha Santoshi, Sunishtha S. Yadav, Vandana Chauhan.
© 2023.
13 pages.
|
Poonam Tanwar.
© 2023.
14 pages.
|
Monika Mehta, Shivani Mishra, Santosh Kumar, Muskaan Bansal.
© 2023.
16 pages.
|
|
|