IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Improving the Quality of Linked Data Using Statistical Distributions

Improving the Quality of Linked Data Using Statistical Distributions
View Sample PDF
Author(s): Heiko Paulheim (University of Mannheim, Germany)and Christian Bizer (University of Mannheim, Germany)
Copyright: 2018
Pages: 27
Source title: Information Retrieval and Management: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-5191-1.ch074

Purchase

View Improving the Quality of Linked Data Using Statistical Distributions on the publisher's website for pricing and purchasing information.

Abstract

Linked Data on the Web is either created from structured data sources (such as relational databases), from semi-structured sources (such as Wikipedia), or from unstructured sources (such as text). In the latter two cases, the generated Linked Data will likely be noisy and incomplete. In this paper, we present two algorithms that exploit statistical distributions of properties and types for enhancing the quality of incomplete and noisy Linked Data sets: SDType adds missing type statements, and SDValidate identifies faulty statements. Neither of the algorithms uses external knowledge, i.e., they operate only on the data itself. We evaluate the algorithms on the DBpedia and NELL knowledge bases, showing that they are both accurate as well as scalable. Both algorithms have been used for building the DBpedia 3.9 release: With SDType, 3.4 million missing type statements have been added, while using SDValidate, 13,000 erroneous RDF statements have been removed from the knowledge base.

Related Content

Hrithik Raj, Ritu Punhani, Ishika Punhani. © 2023. 31 pages.
Divi Anand, Isha Kaushik, Jasmehar Singh Mann, Ritu Punhani, Ishika Punhani. © 2023. 21 pages.
Jayanthi G., Purushothaman R.. © 2023. 10 pages.
Anshika Gupta, Shuchi Sirpal. © 2023. 14 pages.
Reet Kaur Kohli, Seneha Santoshi, Sunishtha S. Yadav, Vandana Chauhan. © 2023. 13 pages.
Poonam Tanwar. © 2023. 14 pages.
Monika Mehta, Shivani Mishra, Santosh Kumar, Muskaan Bansal. © 2023. 16 pages.
Body Bottom