The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Hierarchical Document Clustering
|
|
Author(s): Benjamin C.M. Fung (Concordia University, Canada), Ke Wang (Simon Fraser University, Canada)and Martin Ester (Simon Fraser University, Canada)
Copyright: 2009
Pages: 6
Source title:
Encyclopedia of Data Warehousing and Mining, Second Edition
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-60566-010-3.ch150
Purchase
|
Abstract
Document clustering is an automatic grouping of text documents into clusters so that documents within a cluster have high similarity in comparison to one another, but are dissimilar to documents in other clusters. Unlike document classification (Wang, Zhou, & He, 2001), no labeled documents are provided in clustering; hence, clustering is also known as unsupervised learning. Hierarchical document clustering organizes clusters into a tree or a hierarchy that facilitates browsing. The parent-child relationship among the nodes in the tree can be viewed as a topic-subtopic relationship in a subject hierarchy such as the Yahoo! directory. This chapter discusses several special challenges in hierarchical document clustering: high dimensionality, high volume of data, ease of browsing, and meaningful cluster labels. State-of-the-art document clustering algorithms are reviewed: the partitioning method (Steinbach, Karypis, & Kumar, 2000), agglomerative and divisive hierarchical clustering (Kaufman & Rousseeuw, 1990), and frequent itemset-based hierarchical clustering (Fung, Wang, & Ester, 2003). The last one, which was developed by the authors, is further elaborated since it has been specially designed to address the hierarchical document clustering problem.
Related Content
|
Girija Ramdas, Irfan Naufal Umar, Nurullizam Jamiat, Nurul Azni Mhd Alkasirah.
© 2024.
18 pages.
|
|
Natalia Riapina.
© 2024.
29 pages.
|
|
Xinyu Chen, Wan Ahmad Jaafar Wan Yahaya.
© 2024.
21 pages.
|
|
Fatema Ahmed Wali, Zahra Tammam.
© 2024.
24 pages.
|
|
Su Jiayuan, Jingru Zhang.
© 2024.
26 pages.
|
|
Pua Shiau Chen.
© 2024.
21 pages.
|
|
Minh Tung Tran, Thu Trinh Thi, Lan Duong Hoai.
© 2024.
23 pages.
|
|
|