IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Semantics-Based Document Categorization Employing Semi-Supervised Learning

Semantics-Based Document Categorization Employing Semi-Supervised Learning
View Sample PDF
Author(s): Jan Žižka (Mendel University in Brno, Czech Republic)and František Dařena (Mendel University in Brno, Czech Republic)
Copyright: 2017
Pages: 29
Source title: Artificial Intelligence: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-1759-7.ch077

Purchase

View Semantics-Based Document Categorization Employing Semi-Supervised Learning on the publisher's website for pricing and purchasing information.

Abstract

The automated categorization of unstructured textual documents according to their semantic contents plays important role particularly linked with the ever growing volume of such data originating from the Internet. Having a sufficient number of labeled examples, a suitable supervised machine learning-based classifier can be trained. When no labeling is available, an unsupervised learning method can be applied, however, the missing label information often leads to worse classification results. This chapter demonstrates a method based on semi-supervised learning when a smallish set of manually labeled examples improves the categorization process in comparison with clustering, and the results are comparable with the supervised learning output. For the illustration, a real-world dataset coming from the Internet is used as the input of the supervised, unsupervised, and semi-supervised learning. The results are shown for different number of the starting labeled samples used as “seeds” to automatically label the remaining volume of unlabeled items.

Related Content

Kamel Mouloudj, Vu Lan Oanh LE, Achouak Bouarar, Ahmed Chemseddine Bouarar, Dachel Martínez Asanza, Mayuri Srivastava. © 2024. 20 pages.
José Eduardo Aleixo, José Luís Reis, Sandrina Francisca Teixeira, Ana Pinto de Lima. © 2024. 52 pages.
Jorge Figueiredo, Isabel Oliveira, Sérgio Silva, Margarida Pocinho, António Cardoso, Manuel Pereira. © 2024. 24 pages.
Fatih Pinarbasi. © 2024. 20 pages.
Stavros Kaperonis. © 2024. 25 pages.
Thomas Rui Mendes, Ana Cristina Antunes. © 2024. 24 pages.
Nuno Geada. © 2024. 12 pages.
Body Bottom