Abstract:
Recently, Convolutional Neural Networks with 3D kernels (3D CNNs) have shown great
superiority over 2D CNNs for video processing applications. In the field of Stereoscopic
Video Quality Assessment (SVQA), 3D CNNs are utilized to extract the spatio-temporal
features from the stereoscopic video. Besides, the emergence of substantial video datasets
such as Kinetics has made it possible to use pre-trained 3D CNNs in other video-related
fields. In this paper, we fine-tune 3D Residual Networks (3D ResNets) pre-trained on the
Kinetics dataset for measuring the quality of stereoscopic videos and propose a no-reference
SVQA method. Specifically, our aim is twofold: Firstly, we answer the question: can we use
3D CNNs as a quality-aware feature extractor from stereoscopic videos or not. Secondly,
we explore which ResNet architecture is more appropriate for SVQA. Experimental results
on two publicly available SVQA datasets of LFOVIAS3DPh2 and NAMA3DS1-COSPAD1
show the effectiveness of the proposed transfer learning-based method for SVQA that provides the RMSE of 0.332 in LFOVIAS3DPh2 dataset. Also, the results show that deeper 3D
ResNet models extract more efficient quality-aware features.
Keywords 3D convolutional neural networks · Fine-tuning ·
Objective quality assessment · Pre-training · Stereoscopic video · Transfer learning