IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Image Captioning Made Easy: Leveraging Vision Transformers and GPT-2 to Create Accurate and Coherent Descriptions From Images

Image Captioning Made Easy: Leveraging Vision Transformers and GPT-2 to Create Accurate and Coherent Descriptions From Images
View Sample PDF
Author(s): Ayesha Taranum (Vidyavardhaka College of Engineering, India)and Mohammed Ezhan (Northeastern University, USA)
Copyright: 2026
Pages: 14
Source title: AI-Based Data Mobility and Intelligent Modeling for Smart Cities
Source Author(s)/Editor(s): Sultan Ahmad (Prince Sattam Bin Abdulaziz University, Saudi Arabia), Sudan Jha (Kathmandu University, Nepal)and Md Alimul Haque (Veer Kunwar Singh University, India)
DOI: 10.4018/979-8-3373-4202-3.ch011

Purchase


Abstract

Image captioning, which is the generation of descriptive word text summaries from image content, has drawn considerable interest in computer vision and natural language processing (NLP). This research proposes a Python application that combines Vision Transformers (ViT) and GPT-2 for automatic image captioning. The system employs a pre-trained NLP connect/vit-gpt2-image-captioning model from Hugging Face, coupled with a graphical user interface (GUI) designed using Tkinter. The model efficiently extracts features from images and produces coherent, contextually appropriate captions, showing improvement over conventional Convolutional Neural Network-Long Short Term Memory(CNN-LSTM) based models. This study emphasises the architecture, methodology, and comparison of the system, highlighting its applicability in real-world applications such as visually impaired accessibility, content management, and image retrieval. Performance measurement suggests the model's capacity to produce high-quality captions in an efficient manner.

Related Content

Mohammad Shuaib Khan, Mohammad Mazhar Afzal. © 2026. 36 pages.
Raj Kishor Verma, Raj Kishor Verma. © 2026. 30 pages.
Shashikant Nishant Sharma, Kavita Dehalwar. © 2026. 40 pages.
Mohammad Shuaib Khan, Mohammad Mazhar Afzal. © 2026. 28 pages.
Munir Ahmad, Arifur Rahman, Bivash Ranjan Chowdhury, Hossain Mohammad Dalim. © 2026. 24 pages.
G. Swetha, M. S. Veena, Tejaswini Krishnamurthy, S. Druva Kumar, M. Shruthi, S. Vishwanatha, D. Rajeshwari, N. Raghu, Kamal Narayanan, G. B. Arjun Kumar. © 2026. 28 pages.
Sonu Sharma, Nikhil Kumar Goyal. © 2026. 34 pages.
Body Bottom