IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

XML Document Clustering

XML Document Clustering
View Sample PDF
Author(s): Andrea Tagarelli (University of Calabria, Italy)
Copyright: 2009
Pages: 9
Source title: Handbook of Research on Innovations in Database Technologies and Applications: Current and Future Trends
Source Author(s)/Editor(s): Viviana E. Ferraggine (UNICEN, Argentina), Jorge Horacio Doorn (UNICEN, Argentina)and Laura C. Rivero (UNICEN, Argentina)
DOI: 10.4018/978-1-60566-242-8.ch071

Purchase

View XML Document Clustering on the publisher's website for pricing and purchasing information.

Abstract

The ability of providing a “standardized, extensible means of coupling semantic information within documents describing semistructured data” (Chaudhri, Rashid, & Zicari, 2003) has led to a steady growth of XML (extensible markup language) data sources, so that XML is touted as the driving force for representing and exchanging data on the Web. The motivation behind any clustering problem is to find an inherent structure of relationships in the data and expose this structure as a set of clusters where the objects within the same cluster are each to other highly similar but very dissimilar from objects in different clusters. The clustering problem finds in text databases a fruitful research area. Since today semistructured text data has become more prevalent on the Web, and XML is the de facto standard for such data, clustering XML documents has increasingly attracted great attention. Any application domain that needs organization of complex document structures (e.g., hierarchical structures with unbounded nesting, object-oriented hierarchies) as well as data containing a few structured fields together with some largely unstructured text components can be profitably assisted by an XML document clustering task.

Related Content

Renjith V. Ravi, Mangesh M. Ghonge, P. Febina Beevi, Rafael Kunst. © 2022. 24 pages.
Manimaran A., Chandramohan Dhasarathan, Arulkumar N., Naveen Kumar N.. © 2022. 20 pages.
Ram Singh, Rohit Bansal, Sachin Chauhan. © 2022. 19 pages.
Subhodeep Mukherjee, Manish Mohan Baral, Venkataiah Chittipaka. © 2022. 17 pages.
Vladimir Nikolaevich Kustov, Ekaterina Sergeevna Selanteva. © 2022. 23 pages.
Krati Reja, Gaurav Choudhary, Shishir Kumar Shandilya, Durgesh M. Sharma, Ashish K. Sharma. © 2022. 18 pages.
Nwosu Anthony Ugochukwu, S. B. Goyal. © 2022. 23 pages.
Body Bottom