Visual Speech Recognition by Lip Reading Using Deep Learning

View Sample PDF

Author(s): V. Prakash (SASTRA University, India), R. Bhavani (SASTRA University, India), Durga Karthik (SASTRA University, India), D. Rajalakshmi (SASTRA University, India), N. Rajeswari (SASTRA University, India)and M. Martinaa (SASTRA University, India)
Copyright: 2024
Pages: 21
Source title: Advanced Applications in Osmotic Computing
Source Author(s)/Editor(s): G. Revathy (SASTRA University, India)
DOI: 10.4018/979-8-3693-1694-8.ch015

Keywords: Cloud Computing / Computer Science & IT / Engineering Science Reference / Systems and Software Engineering

Purchase

View Visual Speech Recognition by Lip Reading Using Deep Learning on the publisher's website for pricing and purchasing information.

Abstract

By using image processing techniques, visual voice recognition (VSR) is able to extract voice or textual data from facial features. Similar to speech recognition systems, lip reading (LR) systems encounter issues because of variations in facial characteristics, speaking rates, skin tones, and pronunciations. An audio speech recognition system can be synchronised with the LR systems. The lip movement data, also known as lip characteristics or visemes, were obtained from the input video clip that was saved in the cloud. It takes each frame's lip features and stores them. Furthermore, training using a varied number of frames prevents a training dataset from yielding suitable text matches. Two parts make up the system: a feature extraction approach that turns lip characteristics into a visual feature cube and a Conv3D algorithm that matches words to their associated visemes. Precision is found in around 89% of the words. As a result, the 3D-CNN for the MIRACL-VC1 dataset performs better and offers increased classification accuracy when compared to the prior system.

The IRMA Community

Research IRM

Visual Speech Recognition by Lip Reading Using Deep Learning

Purchase

Abstract

Related Content

IRMA Sponsors