IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Issues and Challenges in Web Crawling for Information Extraction

Issues and Challenges in Web Crawling for Information Extraction
View Sample PDF
Author(s): Subrata Paul (Vignan Institute of Technology and Management, India), Anirban Mitra (Vignan Institute of Technology and Management, India)and Swagata Dey (MIPS, MITS, Rayagada, India)
Copyright: 2017
Pages: 29
Source title: Bio-Inspired Computing for Information Retrieval Applications
Source Author(s)/Editor(s): D.P. Acharjya (School of Computing Science and Engineering, VIT University, India)and Anirban Mitra (Vignan Institute of Technology and Management, India)
DOI: 10.4018/978-1-5225-2375-8.ch004

Purchase

View Issues and Challenges in Web Crawling for Information Extraction on the publisher's website for pricing and purchasing information.

Abstract

Computational biology and bio inspired techniques are part of a larger revolution that is increasing the processing, storage and retrieving of data in major way. This larger revolution is being driven by the generation and use of information in all forms and in enormous quantities and requires the development of intelligent systems for gathering, storing and accessing information. This chapter describes the concepts, design and implementation of a distributed web crawler that runs on a network of workstations and has been used for web information extraction. The crawler needs to scale (at least) several hundred pages per second, is resilient against system crashes and other events, and is capable to adapted various crawling applications. Further this chapter, focusses on various ways in which appropriate biological and bio inspired tools can be used to implement, automatically locate, understand, and extract online data independent of the source and also to make it available for Semantic web agents like a web crawler.

Related Content

S. Karthigai Selvi, Sharmistha Dey, Siva Shankar Ramasamy, Krishan Veer Singh. © 2025. 16 pages.
S. Sheeba Rani, M. Mohammed Yassen, Srivignesh Sadhasivam, Sharath Kumar Jaganathan. © 2025. 22 pages.
U. Vignesh, K. Gokul Ram, Abdulkareem Sh. Mahdi Al-Obaidi. © 2025. 22 pages.
Monica Bhutani, Monica Gupta, Ayushi Jain, Nishant Rajoriya, Gitika Singh. © 2025. 24 pages.
U. Vignesh, Arpan Singh Parihar. © 2025. 34 pages.
Sharmistha Dey, Krishan Veer Singh. © 2025. 20 pages.
Kalpana Devi. © 2025. 26 pages.
Body Bottom