IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Mining Statistically Significant Substrings based on the Chi-Square Measure

Mining Statistically Significant Substrings based on the Chi-Square Measure
View Sample PDF
Author(s): Sourav Dutta (IBM Research Lab, India)and Arnab Bhattacharya (Indian Institute of Technology Kanpur, India)
Copyright: 2013
Pages: 10
Source title: Bioinformatics: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-4666-3604-0.ch083

Purchase

View Mining Statistically Significant Substrings based on the Chi-Square Measure on the publisher's website for pricing and purchasing information.

Abstract

With the tremendous expansion of reservoirs of sequence data stored worldwide, efficient mining of large string databases in various domains including intrusion detection systems, player statistics, texts, and proteins, has emerged as a practical challenge. Searching for an unusual pattern within long strings of data is one of the foremost requirements for many diverse applications. Given a string, the problem is to identify the substrings that differ the most from the expected or normal behavior, i.e., the substrings that are statistically significant (or, in other words, less likely to occur due to chance alone). We first survey and analyze the different statistical measures available to meet this end. Next, we argue that the most appropriate metric is the chi-square measure. Finally, we discuss different approaches and algorithms proposed for retrieving the top-k substrings with the largest chi-square measure.

Related Content

Alessandra Lima da Silva, Diego Mariano, Mariana Parise, Angie L. A. Puelles, Tatiane Senna Bialves, Luana Luiza Bastos, Lucas Santos, Rafael Pereira Lemos. © 2025. 22 pages.
Seyyed Mohammad Amin Mousavi Sagharchi, Mohsen Sheykhhasan, Atousa Ghorbani, Elina Afrazeh, Naresh Poondla, Naser Kalhor, Hamid Tanzadehpanah, Hanie Mahaki, Hamed Manoochehri. © 2025. 46 pages.
Eduarda Guimarães Sousa, Lucas Gabriel Rodrigues Gomes, Fernanda Diniz Prates, Talita Pereira Gomes, Gabriel Camargos Gomes, Janaíne Aparecida de Paula, Ana Lua de Oliveira Vinhal, Bernardo Buhr Alves Mendonça, Mariana Letícia Costa Pedrosa, Luiza Pereira Reis, Aline Ferreira Maciel de Oliveira, Marcus Vinicius Canário Viana, Arun Kumar Jaiswal, Siomar de Castro Soares, Vasco Ariston de Carvalho Azevedo. © 2025. 38 pages.
Diego Mariano, Lucas Moraes dos Santos, Raquel Cardoso de Melo-Minardi. © 2025. 30 pages.
Alessandra G. Cioletti, Frederico C. Carvalho, Lucas M. Dos Santos, Raquel C. M. Minardi. © 2025. 32 pages.
Leandro Morais de Oliveira, Luana Luiza Bastos, Vivian Morais Paixão, Leticia Aparecida Gontijo, Tatiane Senna Bialves, Diego Mariano, Raquel Cardoso de Melo Minardi. © 2025. 40 pages.
Angie Atoche Puelles, Luana Luiza Bastos, Vivian Morais Paixão, Sheila Cruz Araujo, Raquel Cardoso de Melo Minardi. © 2025. 28 pages.
Body Bottom