IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks

Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks
View Sample PDF
Author(s): Nivedita M. (Vellore Institute of Technology, Chennai, India), AsnathVictyPhamila Y. (Vellore Institute of Technology, Chennai, India), Umashankar Kumaravelan (Independent Researcher, India)and Karthikeyan N. (Syed Ammal Engineering College, India)
Copyright: 2023
Pages: 23
Source title: Principles and Applications of Socio-Cognitive and Affective Computing
Source Author(s)/Editor(s): S. Geetha (Vellore Institute of Technology, Chennai, India), Karthika Renuka (PSG College of Technology, India), Asnath Victy Phamila (Vellore Institute of Technology, Chennai, India)and Karthikeyan N. (Syed Ammal Engineering College, India)
DOI: 10.4018/978-1-6684-3843-5.ch011

Purchase

View Voice-Based Image Captioning System for Assisting Visually Impaired People Using Neural Networks on the publisher's website for pricing and purchasing information.

Abstract

Many people worldwide have the problem of visual impairment. The authors' idea is to design a novel image captioning model for assisting the blind people by using deep learning-based architecture. Automatic understanding of the image and providing description of that image involves tasks from two complex fields: computer vision and natural language processing. The first task is to correctly identify objects along with their attributes present in the given image, and the next is to connect all the identified objects along with actions and generating the statements, which should be syntactically correct. From the real-time video, the features are extracted using a convolutional neural network (CNN), and the feature vectors are given as input to long short-term memory (LSTM) network to generate the appropriate captions in a natural language (English). The captions can then be converted into audio files, which the visually impaired people can listen. The model is tested on the two standardized image captioning datasets Flickr 8K and MSCOCO and evaluated using BLEU score.

Related Content

Hemalatha J. J., Bala Subramanian Chokkalingam, Vivek V., Sekar Mohan. © 2023. 14 pages.
R. Muthuselvi, G. Nirmala. © 2023. 12 pages.
Jerritta Selvaraj, Arun Sahayadhas. © 2023. 16 pages.
Vidhya R., Sandhia G. K., Jansi K. R., Nagadevi S., Jeya R.. © 2023. 8 pages.
Shanthalakshmi Revathy J., Uma Maheswari N., Sasikala S.. © 2023. 13 pages.
Uma N. Dulhare, Shaik Rasool. © 2023. 29 pages.
R. Nareshkumar, G. Suseela, K. Nimala, G. Niranjana. © 2023. 22 pages.
Body Bottom