IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Chinese Text Categorization via Bottom-Up Weighted Word Clustering

Chinese Text Categorization via Bottom-Up Weighted Word Clustering
View Sample PDF
Author(s): Yu-Chieh Wu (Ming-Chuan University, Taiwan)
Copyright: 2017
Pages: 13
Source title: Artificial Intelligence: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-1759-7.ch022

Purchase

View Chinese Text Categorization via Bottom-Up Weighted Word Clustering on the publisher's website for pricing and purchasing information.

Abstract

Most of the researches on text categorization are focus on using bag of words. Some researches provided other methods for classification such as term phrase, Latent Semantic Indexing, and term clustering. Term clustering is an effective way for classification, and had been proved as a good method for decreasing the dimensions in term vectors. The authors used hierarchical term clustering and aggregating similar terms. In order to enhance the performance, they present a modify indexing with terms in cluster. Their test collection extracted from Chinese NETNEWS, and used the Centroid-Based classifier to deal with the problems of categorization. The results had shown that term clustering is not only reducing the dimensions but also outperform than bag of words. Thus, term clustering can be applied to text classification by using any large corpus, its objective is to save times and increase the efficiency and effectiveness. In addition to performance, these clusters can be considered as conceptual knowledge base, and kept related terms of real world.

Related Content

Kamel Mouloudj, Vu Lan Oanh LE, Achouak Bouarar, Ahmed Chemseddine Bouarar, Dachel Martínez Asanza, Mayuri Srivastava. © 2024. 20 pages.
José Eduardo Aleixo, José Luís Reis, Sandrina Francisca Teixeira, Ana Pinto de Lima. © 2024. 52 pages.
Jorge Figueiredo, Isabel Oliveira, Sérgio Silva, Margarida Pocinho, António Cardoso, Manuel Pereira. © 2024. 24 pages.
Fatih Pinarbasi. © 2024. 20 pages.
Stavros Kaperonis. © 2024. 25 pages.
Thomas Rui Mendes, Ana Cristina Antunes. © 2024. 24 pages.
Nuno Geada. © 2024. 12 pages.
Body Bottom