Abstract:
Most of the current 3D skeleton human action recognition models based on convolutional neural network do not fully explore the geometric features embedded in skeleton sequences. To make up for this deficiency, based on the AIF-CNN model, the multi-stream fusion network model (MS-CNN for short) is proposed. A geometric feature (kernel feature) is proposed as the input of MS-CNN, which plays the role of enrich the original features. At the same time, the multi-motion feature is proposed, which allows the model to learn a more robust global motion information. Finally, ablation experiments are conducted on NTU RGB+D 60 dataset, and the MS-CNN model was compared with 19 and 8 action recognition models on NTU RGB+D 60 dataset and NTU RGB+D 120 dataset, respectively. The ablation experimental results show that the MS-CNN model using joint features fused with kernel features has higher recognition accuracy than fused with core features; In addition, the recognition accuracy of the MS-CNN model improves with the increase of multi-motion features. The comparison experimental results show that the MS-CNN model outperforms most of the comparison models (including the benchmark AIF-CNN model) in terms of recognition accuracy under the 2 evaluation strategies.