Sign Language Recognition Using the Fusion of Image and Hand Landmarks Through Multi-Headed Convolutional Neural Network

Pathan, Refat Khan; Biswas, Munmun; Yasmin, Suraiya; Khandaker, Mayeen Uddin; Salman, Mohammad; Youssef, Ahmed A. F.

DSpace Home
→
DIU Faculty Publication
→
Articles
→
View Item

dc.contributor.author	Pathan, Refat Khan
dc.contributor.author	Biswas, Munmun
dc.contributor.author	Yasmin, Suraiya
dc.contributor.author	Khandaker, Mayeen Uddin
dc.contributor.author	Salman, Mohammad
dc.contributor.author	Youssef, Ahmed A. F.
dc.date.accessioned	2024-08-27T09:10:13Z
dc.date.available	2024-08-27T09:10:13Z
dc.date.issued	2023-10-09
dc.identifier.uri	http://dspace.daffodilvarsity.edu.bd:8080/handle/123456789/13236
dc.description.abstract	Sign Language Recognition is a breakthrough for communication among deaf-mute society and has been a critical research topic for years. Although some of the previous studies have successfully recognized sign language, it requires many costly instruments including sensors, devices, and high-end processing power. However, such drawbacks can be easily overcome by employing artificial intelligence-based techniques. Since, in this modern era of advanced mobile technology, using a camera to take video or images is much easier, this study demonstrates a cost-effective technique to detect American Sign Language (ASL) using an image dataset. Here, “Finger Spelling, A” dataset has been used, with 24 letters (except j and z as they contain motion). The main reason for using this dataset is that these images have a complex background with different environments and scene colors. Two layers of image processing have been used: in the first layer, images are processed as a whole for training, and in the second layer, the hand landmarks are extracted. A multi-headed convolutional neural network (CNN) model has been proposed and tested with 30% of the dataset to train these two layers. To avoid the overfitting problem, data augmentation and dynamic learning rate reduction have been used. With the proposed model, 98.981% test accuracy has been achieved. It is expected that this study may help to develop an efficient human–machine communication system for a deaf-mute society.	en_US
dc.language.iso	en_US	en_US
dc.publisher	Springer Nature	en_US
dc.subject	Sign language	en_US
dc.subject	Neural networks	en_US
dc.subject	Communication	en_US
dc.title	Sign Language Recognition Using the Fusion of Image and Hand Landmarks Through Multi-Headed Convolutional Neural Network	en_US
dc.type	Article	en_US