基于多流融合网络的3D骨架人体行为识别

3D Skeleton-based Human Action Recognition Based on Multi-stream Fusion Network

  • 摘要: 当前大多基于卷积神经网络的3D骨架人体行为识别模型没有充分挖掘骨架序列所蕴含的几何特征,为了弥补这方面的不足,文章在AIF-CNN模型的基础上进行改进,提出多流融合网络模型(MS-CNN)。在此模型中,新增一种几何特征(kernel特征)作为输入,起到了丰富原始特征的作用;新增多运动特征,使模型学习到更加健壮的全局运动信息。最后,在NTU RGB+D 60数据集上进行消融实验,分别在NTU RGB+D 60数据集、NTU RGB+D 120数据集上,将MS-CNN模型与19、8个行为识别模型进行对比实验。消融实验结果表明:MS-CNN模型采用joint特征与kernel特征融合,其识别准确率比与core特征融合的高;随着多运动特征的增多,MS-CNN模型的识别准确率有所提高。对比实验结果表明:MS-CNN模型在2个评估策略下的识别准确率超过了大部分对比模型(包括基准AIF-CNN模型)。

     

    Abstract: Most of the current 3D skeleton human action recognition models based on convolutional neural network do not fully explore the geometric features embedded in skeleton sequences. To make up for this deficiency, based on the AIF-CNN model, the multi-stream fusion network model (MS-CNN for short) is proposed. A geometric feature (kernel feature) is proposed as the input of MS-CNN, which plays the role of enrich the original features. At the same time, the multi-motion feature is proposed, which allows the model to learn a more robust global motion information. Finally, ablation experiments are conducted on NTU RGB+D 60 dataset, and the MS-CNN model was compared with 19 and 8 action recognition models on NTU RGB+D 60 dataset and NTU RGB+D 120 dataset, respectively. The ablation experimental results show that the MS-CNN model using joint features fused with kernel features has higher recognition accuracy than fused with core features; In addition, the recognition accuracy of the MS-CNN model improves with the increase of multi-motion features. The comparison experimental results show that the MS-CNN model outperforms most of the comparison models (including the benchmark AIF-CNN model) in terms of recognition accuracy under the 2 evaluation strategies.

     

/

返回文章
返回