IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Analytics for Noisy Unstructured Text Data II

Analytics for Noisy Unstructured Text Data II
View Sample PDF
Author(s): L. Venkata Subramaniam (IBM Research, India Research Lab, India)and Shourya Roy (IBM Research, India Research Lab, India)
Copyright: 2009
Pages: 5
Source title: Encyclopedia of Artificial Intelligence
Source Author(s)/Editor(s): Juan Ramón Rabuñal Dopico (University of A Coruña, Spain), Julian Dorado (University of A Coruña, Spain)and Alejandro Pazos (University of A Coruña, Spain)
DOI: 10.4018/978-1-59904-849-9.ch016

Purchase

View Analytics for Noisy Unstructured Text Data II on the publisher's website for pricing and purchasing information.

Abstract

The importance of text mining applications is growing proportionally with the exponential growth of electronic text. Along with the growth of internet many other sources of electronic text have become really popular. With increasing penetration of internet, many forms of communication and interaction such as email, chat, newsgroups, blogs, discussion groups, scraps etc. have become increasingly popular. These generate huge amount of noisy text data everyday. Apart from these the other big contributors in the pool of electronic text documents are call centres and customer relationship management organizations in the form of call logs, call transcriptions, problem tickets, complaint emails etc., electronic text generated by Optical Character Recognition (OCR) process from hand written and printed documents and mobile text such as Short Message Service (SMS). Though the nature of each of these documents is different but there is a common thread between all of these—presence of noise. An example of information extraction is the extraction of instances of corporate mergers, more formally MergerBetween(company1,company2,date), from an online news sentence such as: “Yesterday, New-York based Foo Inc. announced their acquisition of Bar Corp.” Opinion(product1,good), from a blog post such as: “I absolutely liked the texture of SheetK quilts.” At superficial level, there are two ways for information extraction from noisy text. The first one is cleaning text by removing noise and then applying existing state of the art techniques for information extraction. There in lies the importance of techniques for automatically correcting noisy text. In this chapter, first we will review some work in the area of noisy text correction. The second approach is to devise extraction techniques which are robust with respect to noise. Later in this chapter, we will see how the task of information extraction is affected by noise.

Related Content

Kamel Mouloudj, Vu Lan Oanh LE, Achouak Bouarar, Ahmed Chemseddine Bouarar, Dachel Martínez Asanza, Mayuri Srivastava. © 2024. 20 pages.
José Eduardo Aleixo, José Luís Reis, Sandrina Francisca Teixeira, Ana Pinto de Lima. © 2024. 52 pages.
Jorge Figueiredo, Isabel Oliveira, Sérgio Silva, Margarida Pocinho, António Cardoso, Manuel Pereira. © 2024. 24 pages.
Fatih Pinarbasi. © 2024. 20 pages.
Stavros Kaperonis. © 2024. 25 pages.
Thomas Rui Mendes, Ana Cristina Antunes. © 2024. 24 pages.
Nuno Geada. © 2024. 12 pages.
Body Bottom