Facial Expression Recognition Based on Improved Octave Convolution Algorithm
-
摘要: 针对目前人脸表情识别存在准确率不高、模型复杂和计算量大的问题,文章提出了一种基于八度卷积改进的人脸表情识别模型(OCNN):使用改进的八度卷积进行特征提取,提高对细节特征的提取效果,降低特征图的冗余,在不增加参数的同时减少运算量,以提高特征提取性能;利用DyReLU激活函数来增强模型的学习和表达能力;使用自适应平均池化下采样层代替全连接层,以减少参数;将模型在大规模数据集上进行预训练,并在FER2013、FERPlus、RAF-DB数据集上进行模型性能验证实验。实验结果表明:训练后的模型权重为10.4 MB,在人脸表情识别数据集FER2013、FERPlus和RAF-DB上的准确率分别达到73.53%、89.58%和88.50%;与目前诸模型相比,OCNN模型的准确性高且计算资源消耗低,充分证明了该模型的有效性。Abstract: Aiming at the problems of low accuracy, complex model and large amount of calculation in current facial expression recognition, an improved facial expression recognition method based on octave convolution was proposed in this paper. The improved octave convolution is used for feature extraction, which improves the extraction effect of detailed feature, decline the redundancy in the feature map, and reduce the amount of calculation without increa-sing the parameters, so as to improve the feature extraction performance; DyReLU activation function was used to enhance the learning and expression capabilities of the model. The parameters are reduced using an adaptive mean pooling downsampling layer instead of a fully connected layer; the model is pre-trained on a large dataset, and then the model performance verification experiment is performed on the FER2013, FERPlus and RAF-DB datasets. The experimental results show that the trained model weights is only 10.4 MB, and the accuracy of the model on the expression recognition datasets FER2013, FERPlus and RAF-DB are 73.53%, 89.58% and 88.50% respectively. Compared with many current models, the OCNN model had higher accuracy and lower computing resource consumption, which fully proved the effectiveness of this model.
-
表 1 不同低频分量占比模型的性能对比
Table 1. The performance comparation of models with different low-frequency components
模型 参数量/(×106) 乘加次数/(×109) 浮点运算次数/(×109) RAF-DB准确率/% OCNN-a 2.7 2.65 1.33 86.05 OCNN-b 2.7 2.15 1.08 86.99 OCNN-c 2.7 1.65 0.83 87.19 OCNN-d 2.7 1.15 0.58 86.44 表 2 原始八度卷积模型和OCNN-c模型的对比
Table 2. The comparation of original octave and OCNN-c convolution
模型 参数量/(×106) 准确率/% FER2013 FERPlus RAF-DB 传统八度卷积模型 2.9 72.56 88.49 86.34 OCNN-c 2.7 72.64 88.62 87.19 表 3 使用不同激活函数的OCNN-c模型的准确率
Table 3. Accuracy of OCNN-c model with different activation functions
模型 参数量/(×106) 准确率/% FER2013 FERPlus RAF-DB ReLU-a 8.9 72.47 88.46 86.63 DyReLU-a 9.3 72.44 88.49 86.79 ReLU-b 2.3 72.11 88.42 86.34 DyReLU-b 2.7 72.64 88.62 87.19 表 4 与主流网络模型的对比
Table 4. The comparation of mainstream network models
模型 输入特征图的尺寸 参数量/(×106) 准确率/% 乘加次数/(×109) 存储空间/MB FER2013 FERPlus RAF-DB Alexnet[15] 227×227×3 57.0 71.55 87.40 84.62 1.67 228.1 VGG16[16] 224×224×3 134.3 72.72 88.17 86.96 30.95 537.2 ResNet-18[17] 224×224×3 11.2 71.19 87.79 85.92 3.64 44.8 DenseNet-121[26] 224×224×3 7.0 71.66 88.59 84.58 5.74 28.5 OCNN 90×90×1 2.7 72.64 88.62 87.19 1.65 10.4 表 5 不同模型在FER2013数据集上的实验结果
Table 5. Experimental results of different models on the FER2013 dataset
表 6 不同模型在FERPlus数据集上的实验结果
Table 6. Experimental results of different models on the FERPlus dataset
-
[1] SHAN C F, GONG S G, MCOWAN P W. Facial expression recognition based on Local Binary Patterns: a comprehensive study[J]. Image and Vision Computing, 2009, 27(6): 803-816. doi: 10.1016/j.imavis.2008.08.005 [2] MAHMUD F, ISLAM B, HOSSAIN A, et al. Facial region segmentation based emotion recognition using K-Nearest Neighbors[C]//Proceedings of 2018 International Confe-rence on Innovation in Engineering and Technology. Dhaka: IEEE, 2018: 1-5. [3] ZHI R C, FLIERL M, RUAN Q Q, et al. Graph-preserving sparse nonnegative matrix factorization with application to facial expression recognition[J]. IEEE Transaction on Systems, Man, and Cybernetics: Part B, 2011, 41(1): 38-52. doi: 10.1109/TSMCB.2010.2044788 [4] ZHANG Z Y, MU X M, GAO L. Recognizing facial expressions based on Gabor filter selection[C]//Procee dings of International Congress on Image and Signal Processing. Shanghai: IEEE, 2011: 1544-1548. [5] 胡敏, 滕文娣, 王晓华, 等. 融合局部纹理和形状特征的人脸表情识别[J]. 电子与信息学报, 2018, 40(6): 1338-1344. https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX201806010.htmHU M, TENG W D, WANG X H, et al. Facial expression recognition based on local texture and shape features[J]. Journal of Electronics & Information Technology, 2018, 40(6): 1338-1344. https://www.cnki.com.cn/Article/CJFDTOTAL-DZYX201806010.htm [6] MATSUGU M, MORI K, MITARI Y, et al. Subject independent facial expression recognition with robust face detection using a convolutional neural network[J]. Neural Networks, 2003, 16(5/6): 555-559. [7] SUN B, LI L, ZHOU G, et al. Facial expression recognition in the wild based on multimodal texture features[J]. Journal of Electronic Imaging, 2016, 25(6): 061407/1-8. [8] HUANG C. Combining convolutional neural networks for emotion recognition[C]//Proceedings of 2017 IEEE MIT Undergraduate Research Technology Conference. Cambridge: IEEE, 2017: 1-4. [9] 冯杨, 刘蓉, 鲁甜. 基于小尺度核卷积的人脸表情识别[J]. 计算机工程, 2021, 47(4): 262-267. doi: 10.19678/j.issn.1000-3428.0056775FENG Y, LIU R, LU T. Facial expression recognition based on small-scale kernel convolution[J]. Computer Engineering, 2021, 47(4): 262-267. doi: 10.19678/j.issn.1000-3428.0056775 [10] GEORGESCU M I, IONESCU R T, POPESCU M. Local learning with deep and handcrafted features for facial expression recognition[J]. IEEE Access, 2019, 7: 64827-64836. doi: 10.1109/ACCESS.2019.2917266 [11] WANG K, PENG X, YANG J, et al. Suppressing uncertainties for large-scale facial expression recognition[C]//Proceedings of 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 6897-6906. [12] LI H, WANG N, DING X, et al. Adaptively learning facial expression representation via C-F labels and distillation[J]. IEEE Transactions on Image Processing, 2021, 30: 2016-2028. doi: 10.1109/TIP.2021.3049955 [13] 崔子越, 皮家甜, 陈勇, 等. 结合改进VGGNet和Focal Loss的人脸表情识别[J]. 计算机工程与应用, 2021, 57(19): 171-178. doi: 10.3778/j.issn.1002-8331.2007-0492CUI Z Y, PI J T, CHEN Y, et al. Facial expression recognition combined with improved VGGNet and Focal Loss[J]. Computer Engineering and Applications, 2021, 57(19): 171-178. doi: 10.3778/j.issn.1002-8331.2007-0492 [14] XUE F, WANG Q, GUO G. TransFER: learning relation-aware facial expression representations with transformers[C]//Proceedings of 2021 IEEE/CVF International Conference on Computer Vision. Montreal: IEEE, 2021: 3581-3590. [15] KRIZHEVSKY A, SUTSKEVER I, HINTON G E. Imagenet classification with deep convolutional neural networks[J]. Communications of the ACM, 2017, 60(6): 84-90. doi: 10.1145/3065386 [16] SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J/OL]. arXiv, (2015-04-10)[2021-12-16]. https://arxiv.org/abs/1409.1556. [17] HE K, ZHANG X, REN S, et al. Deep residual learning for image recognition[C]//Proceedings of 2016 IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 770-778. [18] CHEN Y, FAN H, XU B, et al. Drop an Octave: reducing spatial redundancy in convolutional neural networks with Octave Convolution[C]//Proceedings of 2019 IEEE/CVF International Conference on Computer Vision. Seoul: IEEE, 2020: 3434-3443. [19] MALLAT S G. A wavelet tour of signal processing[M]. New York: Academic Press, 1999. [20] NAIR V, HINTON G E. Rectified linear units improve restricted Boltzmann machines Vinod Nair[C]//Procee-dings of the 27th International Conference on Machine Learning (ICML-10). Haifa: Omnipress, 2010: 807-814. [21] HE K, ZHANG X, REN S, et al. Delving deep into rectifiers: surpassing human-level performance on ImageNet classification[C]//Proceedings of 2015 IEEE International Conference on Computer Vision. Santiago: IEEE, 2015: 1026-1034. [22] CHEN Y, DAI X, LIU M, et al. Dynamic ReLU[C]//Proceedings of Computer Vision-ECCV 2020: 16th European Conference. Berlin: Springer, 2020: 351-367. [23] GOODFELLOW I J, ERHAN D, CARRIER P L, et al. Challenges in representation learning: a report on three machine learning contests[C]//Neural Information Processing: 20th International Conference, ICONIP 2013. Berlin: Springer, 2013: 117-124. [24] BARSOUM E, ZHANG C, FERRER C C, et al. Training deep networks for facial expression recognition with crowd-sourced label distribution[C]//Proceedings of the 18th ACM International Conference on Multimodal Interaction. New York: ACM, 2016: 279-283. [25] LI S, DENG W, DU J P. Reliable crowdsourcing and deep locality-preserving learning for expression recognition in the wild[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2852-2861. [26] HUANG G, LIU Z, LAURENS V, et al. Densely connected convolutional networks[C]//Proceedings of 2017 IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 2261-2269. [27] SHAO J, QIAN Y. Three convolutional neural network models for facial expression recognition in the wild[J]. Neurocomputing, 2019, 355: 82-92. doi: 10.1016/j.neucom.2019.05.005 [28] JIANG P, WAN B, WANG Q, et al. Fast and efficient facial expression recognition using a Gabor convolutional network[J]. IEEE Signal Processing Letters, 2020, 27: 1954-1958. doi: 10.1109/LSP.2020.3031504 [29] WANG K, PENG X, YANG J, et al. Region attention networks for pose and occlusion robust facial expression re-cognition[J]. IEEE Transactions on Image Processing, 2020, 29: 4057-4069. doi: 10.1109/TIP.2019.2956143 [30] MOLLAHOSSEINI A, HASANI B, MAHOOR M H. AffectNet: a database for facial expression, valence, and arousal computing in the wild[J]. IEEE Transactions on Affective Computing, 2017, 10(1): 18-31. [31] SANG D V, LE T, HA P T. Discriminative deep feature learning for facial emotion recognition[C]//Proceedings of 2018 1st International Conference on Multimedia Analy-sis and Pattern Recognition. Ho Chi Minh City: IEEE, 2018: 1-6. [32] MINAEE S, ABDOLRASHIDI A. Deep-emotion: facial expression recognition using attentional convolutional network[J]. Sensors, 2021, 21(9): 3046/1-16. [33] ZHOU L, FAN X, TJAHJADI T, et al. Discriminative attention-augmented feature learning for facial expression recognition in the wild[J]. Neural Computing and Applications, 2022, 34(2): 925-936. doi: 10.1007/s00521-021-06045-z [34] LI Y, ZENG J, SHAN S, et al. Occlusion aware facial expression recognition using CNN with attention mechanism[J]. IEEE Transactions on Image Processing, 2018, 28(5): 2439-2450. [35] GAN Y, CHEN J, YANG Z, et al. Multiple attention network for facial expression recognition[J]. IEEE Access, 2020, 8: 7383-7393. doi: 10.1109/ACCESS.2020.2963913 [36] MA H, CELIK T, LI H C. Lightweight attention convolutional neural network through network slimming for robust facial expression recognition[J]. Signal Image and Video Processing, 2021, 15: 1507-1515. doi: 10.1007/s11760-021-01883-9 -