IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Models and Approaches for Web Information Extraction and Web Page Understanding

Models and Approaches for Web Information Extraction and Web Page Understanding
View Sample PDF
Author(s): Ruslan R. Fayzrakhmanov (Vienna University of Technology, Austria)
Copyright: 2015
Pages: 26
Source title: The Evolution of the Internet in the Business Sector: Web 1.0 to Web 3.0
Source Author(s)/Editor(s): Pedro Isaías (Universidade Aberta (Portuguese Open University), Portugal), Piet Kommers (University of Twente, The Netherlands)and Tomayess Issa (Curtin University, Australia)
DOI: 10.4018/978-1-4666-7262-8.ch002

Purchase

View Models and Approaches for Web Information Extraction and Web Page Understanding on the publisher's website for pricing and purchasing information.

Abstract

This chapter discusses the main challenges addressed within the fields of Web information extraction and Web page understanding and considers different utilized Web page representations. A configurable Java-based framework for implementing effective methods for Web Page Processing (WPP) called WPPS is presented as the result of this analysis. WPPS leverages a Unified Ontological Model (UOM) of Web pages that describes their different aspects, such as layout, visual features, interface, DOM tree, and the logical structure in the form of one consistent model. The UOM is a formalization of certain layers of a Web page conceptualization defined in the chapter. A WPPS API provided for the development of WPP methods makes it possible to combine the declarative approach, represented by the set of inference rules and SPARQL queries, with the object-oriented approach. The framework is illustrated with one example scenario related to the identification of a Web page navigation menu.

Related Content

Emrah Arğın. © 2022. 16 pages.
Ebru Gülbuğ Erol, Mustafa Gülsün. © 2022. 17 pages.
Yeşim Şener. © 2022. 18 pages.
Salim Kurnaz, Deimantė Žilinskienė. © 2022. 20 pages.
Dorothea Maria Bowyer, Walid El Hamad, Ciorstan Smark, Greg Evan Jones, Claire Beattie, Ying Deng. © 2022. 29 pages.
Savas S. Ates, Vildan Durmaz. © 2022. 24 pages.
Nusret Erceylan, Gaye Atilla. © 2022. 20 pages.
Body Bottom