Abstract:
Human emotions are spontaneous mental states produced by changes in facial muscles,
leading to expressions. In various human-computer interaction applications, techniques
for nonverbal communication like facial expressions, eye movements, and gestures are
employed. Facial emotion, in particular, is widely utilized for conveying an individual's
emotional states and feelings. However, emotion recognition is challenging due to the
need for a clear distinction between facial expressions and the complexity and variability
of emotions. Conventional machine learning algorithms frequently have difficulties in
accurately recognizing emotions since they heavily depend on humangenerated elements. To address this issue, we explored the use of deep learning models
for emotion detection based on facial expressions. Specifically, we evaluated Vision
Transformer (ViT), VGG19, InceptionV3, EfficientNet, and ResNet50 models. The
findings of our study demonstrated that Vision Transformer (ViT) achieved the highest
accuracy rate of 82.96%, followed by Efficient-Net at 82.36%, ResNet50 at 80.87%,
InceptionV3 at 79%, and VGG19 at 78.22%. Based on its excellent accuracy and
robustness, we propose using the Vision Transformer (ViT) for the identification of six
distinct emotions: anger, neutrality, happiness, sadness, disgust, and surprise.