IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

A Machine Learning Approach to Data Cleaning in Databases and Data Warehouses

A Machine Learning Approach to Data Cleaning in Databases and Data Warehouses
View Sample PDF
Author(s): Hamid Haidarian Shahri (University of Maryland, USA)
Copyright: 2009
Pages: 16
Source title: Database Technologies: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): John Erickson (University of Nebraska, Omaha, USA)
DOI: 10.4018/978-1-60566-058-5.ch136

Purchase

View A Machine Learning Approach to Data Cleaning in Databases and Data Warehouses on the publisher's website for pricing and purchasing information.

Abstract

Entity resolution (also known as duplicate elimination) is an important part of the data cleaning process, especially in data integration and warehousing, where data are gathered from distributed and inconsistent sources. Learnable string similarity measures are an active area of research in the entity resolution problem. Our proposed framework builds upon our earlier work on entity resolution, in which fuzzy rules and membership functions are defined by the user. Here, we exploit neuro-fuzzy modeling for the first time to produce a unique adaptive framework for entity resolution, which automatically learns and adapts to the specific notion of similarity at a metalevel. This framework encompasses many of the previous work on trainable and domain-specific similarity measures. Employing fuzzy inference, it removes the repetitive task of hard-coding a program based on a schema, which is usually required in previous approaches. In addition, our extensible framework is very flexible for the end user. Hence, it can be utilized in the production of an intelligent tool to increase the quality and accuracy of data.

Related Content

Renjith V. Ravi, Mangesh M. Ghonge, P. Febina Beevi, Rafael Kunst. © 2022. 24 pages.
Manimaran A., Chandramohan Dhasarathan, Arulkumar N., Naveen Kumar N.. © 2022. 20 pages.
Ram Singh, Rohit Bansal, Sachin Chauhan. © 2022. 19 pages.
Subhodeep Mukherjee, Manish Mohan Baral, Venkataiah Chittipaka. © 2022. 17 pages.
Vladimir Nikolaevich Kustov, Ekaterina Sergeevna Selanteva. © 2022. 23 pages.
Krati Reja, Gaurav Choudhary, Shishir Kumar Shandilya, Durgesh M. Sharma, Ashish K. Sharma. © 2022. 18 pages.
Nwosu Anthony Ugochukwu, S. B. Goyal. © 2022. 23 pages.
Body Bottom