IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities

Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities
View Sample PDF
Author(s): František Dařena (Mendel University in Brno, Czech Republic)and Jan Žižka (Mendel University in Brno, Czech Republic)
Copyright: 2017
Pages: 40
Source title: Artificial Intelligence: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-1759-7.ch081

Purchase

View Revealing Groups of Semantically Close Textual Documents by Clustering: Problems and Possibilities on the publisher's website for pricing and purchasing information.

Abstract

The chapter introduces clustering as a family of algorithms that can be successfully used to organize text documents into groups without prior knowledge of these groups. The chapter also demonstrates using unsupervised clustering to group large amount of unlabeled textual data (customer reviews written informally in five natural languages) so it can be used later for further analysis. The attention is paid to the process of selecting clustering algorithms, their parameters, methods of data preprocessing, and to the methods of evaluating the results by a human expert with an assistance of computers, too. The feasibility has been demonstrated by a number of experiments with external evaluation using known labels and expert validation with an assistance of a computer. It has been found that it is possible to apply the same procedures, including clustering, cluster validation, and detection of topics and significant words for different natural languages with satisfactory results.

Related Content

Kamel Mouloudj, Vu Lan Oanh LE, Achouak Bouarar, Ahmed Chemseddine Bouarar, Dachel Martínez Asanza, Mayuri Srivastava. © 2024. 20 pages.
José Eduardo Aleixo, José Luís Reis, Sandrina Francisca Teixeira, Ana Pinto de Lima. © 2024. 52 pages.
Jorge Figueiredo, Isabel Oliveira, Sérgio Silva, Margarida Pocinho, António Cardoso, Manuel Pereira. © 2024. 24 pages.
Fatih Pinarbasi. © 2024. 20 pages.
Stavros Kaperonis. © 2024. 25 pages.
Thomas Rui Mendes, Ana Cristina Antunes. © 2024. 24 pages.
Nuno Geada. © 2024. 12 pages.
Body Bottom