Abstract:
In order to better extract and fuse the temporal and spatial features in the human skeleton, a multi-stream convolutional neural network (AE-MCN) that integrates spatio-temporal domain attention module is constructed in this paper. Aiming at the problem that most methods ignore the human motion characteristics when mo-deling the correlation of skeleton sequences, so that the scale of the action is not properly modeled, an adaptive selection motion-scale module is introduced in this paper, which can automatically extract key temporal features from the original scale action features; in order to better model features in the temporal and spatial dimensions, an attention module integrates spatio-temporal domain is designed to help the network extract more effective action information by assigning weights to high-dimensional spatio-temporal features. Finally, the comparative experiments were conducted on three commonly used human action recognition datasets (NTU60, JHMDB and UT-Kinect) to verify the effectiveness of the network AE-MCN proposed in this paper. The experimental results proved that compared with ST-GCN, SR-TSL and other networks, the network AE-MCN has achieved better recognition results, which proved that AE-MCN can effectively extract and model the action information, so as to obtain better action recognition performance.