IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Data Mining and the Text Categorization Framework

Data Mining and the Text Categorization Framework
View Sample PDF
Author(s): Paola Cerchiello (University of Pavia, Italy)
Copyright: 2009
Pages: 6
Source title: Encyclopedia of Data Warehousing and Mining, Second Edition
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-60566-010-3.ch062

Purchase

View Data Mining and the Text Categorization Framework on the publisher's website for pricing and purchasing information.

Abstract

The aim of this contribution is to show one of the most important application of text mining. According to a wide part of the literature regarding the aforementioned field, great relevance is given to the classification task (Drucker et al., 1999, Nigam et al., 2000). The application contexts are several and multitask, from text filtering (Belkin & Croft, 1992) to word sense disambiguation (Gale et al., 1993) and author identification ( Elliot and Valenza, 1991), trough anti spam and recently also anti terrorism. As a consequence in the last decade the scientific community that is working on this task, has profuse a big effort in order to solve the different problems in the more efficient way. The pioneering studies on text categorization (TC, a.k.a. topic spotting) date back to 1961 (Maron) and are deeply rooted in the Information Retrieval context, so declaring the engineering origin of the field under discussion. Text categorization task can be briefly defined as the problem of assigning every single textual document into the relative class or category on the basis of the content and employing a classifier properly trained. In the following parts of this contribution we will formalize the classification problem detailing the main issues related.

Related Content

Girija Ramdas, Irfan Naufal Umar, Nurullizam Jamiat, Nurul Azni Mhd Alkasirah. © 2024. 18 pages.
Natalia Riapina. © 2024. 29 pages.
Xinyu Chen, Wan Ahmad Jaafar Wan Yahaya. © 2024. 21 pages.
Fatema Ahmed Wali, Zahra Tammam. © 2024. 24 pages.
Su Jiayuan, Jingru Zhang. © 2024. 26 pages.
Pua Shiau Chen. © 2024. 21 pages.
Minh Tung Tran, Thu Trinh Thi, Lan Duong Hoai. © 2024. 23 pages.
Body Bottom