Data Mining and the Text Categorization Framework

View Sample PDF

Author(s): Paola Cerchiello (University of Pavia, Italy)
Copyright: 2009
Pages: 6
Source title: Encyclopedia of Data Warehousing and Mining, Second Edition
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-60566-010-3.ch062

Purchase

View Data Mining and the Text Categorization Framework on the publisher's website for pricing and purchasing information.

Abstract

The aim of this contribution is to show one of the most important application of text mining. According to a wide part of the literature regarding the aforementioned field, great relevance is given to the classification task (Drucker et al., 1999, Nigam et al., 2000). The application contexts are several and multitask, from text filtering (Belkin & Croft, 1992) to word sense disambiguation (Gale et al., 1993) and author identification ( Elliot and Valenza, 1991), trough anti spam and recently also anti terrorism. As a consequence in the last decade the scientific community that is working on this task, has profuse a big effort in order to solve the different problems in the more efficient way. The pioneering studies on text categorization (TC, a.k.a. topic spotting) date back to 1961 (Maron) and are deeply rooted in the Information Retrieval context, so declaring the engineering origin of the field under discussion. Text categorization task can be briefly defined as the problem of assigning every single textual document into the relative class or category on the basis of the content and employing a classifier properly trained. In the following parts of this contribution we will formalize the classification problem detailing the main issues related.

The IRMA Community

Research IRM

Data Mining and the Text Categorization Framework

Purchase

Abstract

Related Content

IRMA Sponsors