IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Ontology Based Feature Extraction From Text Documents

Ontology Based Feature Extraction From Text Documents
View Sample PDF
Author(s): Abirami A.M (Thiagarajar College of Engineering, India), Askarunisa A. (KLN College of Information Technology, India), Shiva Shankari R A (Thiagarajar College of Engineering, India)and Revathy R. (Thiagarajar College of Engineering, India)
Copyright: 2018
Pages: 22
Source title: Applications of Security, Mobile, Analytic, and Cloud (SMAC) Technologies for Effective Information Processing and Management
Source Author(s)/Editor(s): P. Karthikeyan (Thiagarajar College of Engineering, India)and M. Thangavel (Thiagarajar College of Engineering, India)
DOI: 10.4018/978-1-5225-4044-1.ch009

Purchase

View Ontology Based Feature Extraction From Text Documents on the publisher's website for pricing and purchasing information.

Abstract

This article describes how semantic annotation is the most important need for the categorization of labeled or unlabeled textual documents. Accuracy of document categorization can be greatly improved if documents are indexed or modeled using the semantics rather than the traditional term-frequency model. This annotation has its own challenges like synonymy and polysemy in the document categorization problem. The model proposes to build domain ontology for the textual content so that the problems like synonymy and polysemy in text analysis are resolved to greater extent. Latent Dirichlet Allocation (LDA), the topic modeling technique has been used for feature extraction from the documents. Using the domain knowledge on the concept and the features grouped by LDA, the domain ontology is built in the hierarchical fashion. Empirical results show that LDA is the better feature extraction technique for text documents than TF or TF-IDF indexing technique. Also, the proposed model shows improvement in the accuracy of document categorization when domain ontology built using LDA has been used for document indexing.

Related Content

Dina Darwish. © 2024. 43 pages.
Kassim Kalinaki, Musau Abdullatif, Sempala Abdul-Karim Nasser, Ronald Nsubuga, Julius Kugonza. © 2024. 23 pages.
Yogita Yashveer Raghav, Ramesh Kait. © 2024. 17 pages.
Renuka Devi Saravanan, Shyamala Loganathan, Saraswathi Shunmuganathan. © 2024. 21 pages.
Veera Talukdar, Ardhariksa Zukhruf Kurniullah, Palak Keshwani, Huma Khan, Sabyasachi Pramanik, Ankur Gupta, Digvijay Pandey. © 2024. 30 pages.
Dharmesh Dhabliya, Sukhvinder Singh Dari, Nitin N. Sakhare, Anish Kumar Dhablia, Digvijay Pandey, Balakumar Muniandi, A. Shaji George, A. Shahul Hameed, Pankaj Dadheech. © 2024. 9 pages.
Avtar Singh, Shobhana Kashyap. © 2024. 11 pages.
Body Bottom