融合时空域注意力模块的多流卷积人体动作识别

吴子依; 陈泯融

doi:10.6054/j.jscnun.2023043

融合时空域注意力模块的多流卷积人体动作识别

Multi-stream Convolutional Human Action Recognition Based on the Fusion of Spatio-Temporal Domain Attention Module

摘要

摘要: 为了更好地提取并融合人体骨架中的时序特征和空间特征，文章构建了融合时空域注意力模块的多流卷积神经网络(AE-MCN)：针对目前大多数方法在建模骨架序列相关性时因忽略了人体运动特性而没有对运动尺度进行适当建模的问题，引入了自适应选取运动尺度模块，从原尺度动作特征中自适应地提取关键时序特征；为了更好地对特征进行时间维度和空间维度上的建模，设计了融合时空域的注意力模块，通过对高维时空特征进行权重分配，进而帮助网络提取更有效的动作信息。最后，在3个常用的人体动作识别数据集(NTU60、JHMDB和UT-Kinect)上进行了对比实验，以验证AE-MCN网络的有效性。实验结果表明：与ST-GCN、SR-TSL等网络相比，AE-MCN网络都取得了更好的识别效果，证明AE-MCN网络可以对动作信息进行有效的提取与建模，从而获得较好的动作识别性能。

Abstract: In order to better extract and fuse the temporal and spatial features in the human skeleton, a multi-stream convolutional neural network (AE-MCN) that integrates spatio-temporal domain attention module is constructed in this paper. Aiming at the problem that most methods ignore the human motion characteristics when mo-deling the correlation of skeleton sequences, so that the scale of the action is not properly modeled, an adaptive selection motion-scale module is introduced in this paper, which can automatically extract key temporal features from the original scale action features; in order to better model features in the temporal and spatial dimensions, an attention module integrates spatio-temporal domain is designed to help the network extract more effective action information by assigning weights to high-dimensional spatio-temporal features. Finally, the comparative experiments were conducted on three commonly used human action recognition datasets (NTU60, JHMDB and UT-Kinect) to verify the effectiveness of the network AE-MCN proposed in this paper. The experimental results proved that compared with ST-GCN, SR-TSL and other networks, the network AE-MCN has achieved better recognition results, which proved that AE-MCN can effectively extract and model the action information, so as to obtain better action recognition performance.

HTML全文

参考文献(44)

施引文献

资源附件(0)