Classification of Biological Sequences

View Sample PDF

Author(s): Pratibha Rani (International Institute of Information Technology Hyderabad, India)and Vikram Pudi (International Institute of Information Technology Hyderabad, India)
Copyright: 2013
Pages: 24
Source title: Data Mining: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-4666-2455-9.ch052

Keywords: Data Mining / Data Mining and Databases / Information Science Reference / Library & Information Science

Purchase

View Classification of Biological Sequences on the publisher's website for pricing and purchasing information.

Abstract

The rapid progress of computational biology, biotechnology, and bioinformatics in the last two decades has led to the accumulation of tremendous amounts of biological data that demands in-depth analysis. Data mining methods have been applied successfully for analyzing this data. An important problem in biological data analysis is to classify a newly discovered sequence like a protein or DNA sequence based on their important features and functions, using the collection of available sequences. In this chapter, we study this problem and present two Bayesian classifiers RBNBC (Rani & Pudi, 2008a) and REBMEC (Rani & Pudi, 2008c). The algorithms used in these classifiers incorporate repeated occurrences of subsequences within each sequence (Rani, 2008). Specifically, Repeat Based Naive Bayes Classifier (RBNBC) uses a novel formulation of Naive Bayes, and the second classifier, Repeat Based Maximum Entropy Classifier (REBMEC) uses a novel framework based on the classical Generalized Iterative Scaling (GIS) algorithm.

The IRMA Community

Research IRM

Classification of Biological Sequences

Purchase

Abstract

Related Content

IRMA Sponsors