Distributed Data Mining and its Applications to Intelligent Textual Information Processing

View Free PDF

Author(s): Shibin Qiu (University of New Mexico, USA)and Mei Qiu (Emcore Corporation, USA)
Copyright: 2004
Pages: 5
Source title: Innovations Through Information Technology
Source Editor(s): Mehdi Khosrow-Pour, D.B.A. (Information Resources Management Association, USA)
DOI: 10.4018/978-1-59140-261-9.ch093
ISBN13: 9781616921255
EISBN13: 9781466665347

Keywords: Engineering Science Reference / IT Research & Theory / IT Research and Theory / Library & Information Science

Abstract

Textual information processing is of fundamental importance, due to the massive amount of documents, especially online textual information that we need to process every day. In this paper, we study data mining techniques applied to intelligent textual information processing in distributed environments, including text classification, information extraction (IE) and topic detection and tracking (TDT). These intelligent processing techniques will improve the quality and efficiency of information resource management and utilization. Their statistical models and computational algorithms challenge the researches in data mining and distributed/parallel computing. When successfully applied, they will help enhance and benefit applications in IT, digital library, and information retrieval. Specifically, we study the distributed computing of the following algorithms: naïve Bayes classifier combined with expectation-maximization (EM) for text classification, hidden Markov model for information extraction, and deterministic annealing with EM for topic detection and tracking. We also study the performances of the proposed algorithms and experiment on the improvements.

IRMA Offers Over 2,500 Full Text Open Access Research Papers for Free Download Click to Start Searching Free IRM Research!

IRMA Sponsors

Encyclopedia of Information Science and Technology, Fourth Edition