IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Creating a Data Lakehouse for a South African Government-Sector Learning Control Enforcing Quality Control for Incremental Extract-Load-Transform Pipe

Creating a Data Lakehouse for a South African Government-Sector Learning Control Enforcing Quality Control for Incremental Extract-Load-Transform Pipe
View Sample PDF
Author(s): Dharmesh Dhabliya (Vishwakarma Institute of Information Technology, India), Vivek Veeraiah (Sri Siddharth Institute of Technology, Sri Siddhartha Academy of Higher Education, India), Sukhvinder Singh Dari (Symbiosis Law School, Symbiosis International University, India), Jambi Ratna Raja Kumar (Genba Sopanrao Moze College of Engineering, India), Ritika Dhabliya (ResearcherConnect, India), Sabyasachi Pramanik (Haldia Institute of Technology, India)and Ankur Gupta (Vaish College of Engineering, India)
Copyright: 2024
Pages: 22
Source title: Big Data Quantification for Complex Decision-Making
Source Author(s)/Editor(s): Chao Zhang (Shanxi University, China)and Wentao Li (Southwest University, China)
DOI: 10.4018/979-8-3693-1582-8.ch004

Purchase


Abstract

The Durban University of Technology is now engaged in a project to create a data lake house system for a Training Authority in the South African Government sector. This system is crucial for improving the monitoring and evaluation capacities of the training authority and ensuring efficient service delivery. Ensuring the high quality of data being fed into the lakehouse is crucial, since low data quality negatively impacts the effectiveness of the lakehouse system. This chapter examines quality control methods for ingestion-layer pipelines in order to present a framework for ensuring data quality. The metrics taken into account for assessing data quality were completeness, accuracy, integrity, correctness, and timeliness. The efficiency of the framework was assessed by effectively implementing it on a sample semi-structured dataset. Suggestions for future development including enhancing by integrating data from a wider range of sources and providing triggers for incremental data intake.

Related Content

Yu Bin, Xiao Zeyu, Dai Yinglong. © 2024. 34 pages.
Liyin Wang, Yuting Cheng, Xueqing Fan, Anna Wang, Hansen Zhao. © 2024. 21 pages.
Tao Zhang, Zaifa Xue, Zesheng Huo. © 2024. 32 pages.
Dharmesh Dhabliya, Vivek Veeraiah, Sukhvinder Singh Dari, Jambi Ratna Raja Kumar, Ritika Dhabliya, Sabyasachi Pramanik, Ankur Gupta. © 2024. 22 pages.
Yi Xu. © 2024. 37 pages.
Chunmao Jiang. © 2024. 22 pages.
Hatice Kübra Özensel, Burak Efe. © 2024. 23 pages.
Body Bottom