NetCube: Fast, Approximate Database Queries Using Bayesian Networks

View Sample PDF

Author(s): Dimitris Margaritis (Iowa State University, USA), Christos Faloutsos (Carnegie Mellon University, USA)and Sebastian Thrun (Stanford University, USA)
Copyright: 2009
Pages: 26
Source title: Database Technologies: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): John Erickson (University of Nebraska, Omaha, USA)
DOI: 10.4018/978-1-60566-058-5.ch120

Keywords: Data Mining and Databases / Databases / Engineering Science Reference / Library & Information Science

Purchase

View NetCube: Fast, Approximate Database Queries Using Bayesian Networks on the publisher's website for pricing and purchasing information.

Abstract

We present a novel method for answering count queries from a large database approximately and quickly. Our method implements an approximate DataCube of the application domain, which can be used to answer any conjunctive count query that can be formed by the user. The DataCube is a conceptual device that in principle stores the number of matching records for all possible such queries. However, because its size and generation time are inherently exponential, our approach uses one or more Bayesian networks to implement it approximately. Bayesian networks are statistical graphical models that can succinctly represent the underlying joint probability distribution of the domain, and can therefore be used to calculate approximate counts for any conjunctive query combination of attribute values and “don’t cares.” The structure and parameters of these networks are learned from the database in a preprocessing stage. By means of such a network, the proposed method, called NetCube, exploits correlations and independencies among attributes to answer a count query quickly without accessing the database. Our preprocessing algorithm scales linearly on the size of the database, and is thus scalable; it is also parallelizable with a straightforward parallel implementation. We give an algorithm for estimating the count result of arbitrary queries that is fast (constant) on the database size. Our experimental results show that NetCubes have fast generation and use, achieve excellent compression and have low reconstruction error. Moreover, they naturally allow for visualization and data mining, at no extra cost.

The IRMA Community

Research IRM

NetCube: Fast, Approximate Database Queries Using Bayesian Networks

Purchase

Abstract

Related Content

IRMA Sponsors