3D Skeleton-based Human Action Recognition Based on Multi-stream Fusion Network

CHEN Minrong; PENG Junjie; ZENG Guoqiang

doi:10.6054/j.jscnun.2023009

Journal of South China Normal University (Natural Science Edition) > 2023 > 55(1): 94-101. > DOI: 10.6054/j.jscnun.2023009

CHEN Minrong, PENG Junjie, ZENG Guoqiang. 3D Skeleton-based Human Action Recognition Based on Multi-stream Fusion Network[J]. Journal of South China Normal University (Natural Science Edition), 2023, 55(1): 94-101. DOI: 10.6054/j.jscnun.2023009

Citation:

PDF (957 KB)

3D Skeleton-based Human Action Recognition Based on Multi-stream Fusion Network

1.
School of Computer, South China Normal University, Guangzhou 510631, China
2.
College of Cyber Security, Jinan University, Guangzhou 510631, China

More Information

Received Date: October 14, 2021
Available Online: April 11, 2023

Graphical Abstract

Abstract

Abstract

Most of the current 3D skeleton human action recognition models based on convolutional neural network do not fully explore the geometric features embedded in skeleton sequences. To make up for this deficiency, based on the AIF-CNN model, the multi-stream fusion network model (MS-CNN for short) is proposed. A geometric feature (kernel feature) is proposed as the input of MS-CNN, which plays the role of enrich the original features. At the same time, the multi-motion feature is proposed, which allows the model to learn a more robust global motion information. Finally, ablation experiments are conducted on NTU RGB+D 60 dataset, and the MS-CNN model was compared with 19 and 8 action recognition models on NTU RGB+D 60 dataset and NTU RGB+D 120 dataset, respectively. The ablation experimental results show that the MS-CNN model using joint features fused with kernel features has higher recognition accuracy than fused with core features; In addition, the recognition accuracy of the MS-CNN model improves with the increase of multi-motion features. The comparison experimental results show that the MS-CNN model outperforms most of the comparison models (including the benchmark AIF-CNN model) in terms of recognition accuracy under the 2 evaluation strategies.
- human action recognition,
- 3D skeleton,
- multi-stream fusion network,
- convolution neural network

FullText(HTML)

References (33)

References

[1]	钱慧芳, 易剑平, 付云虎. 基于深度学习的人体动作识别综述[J]. 计算机科学与探索, 2021, 15(3): 438-455. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS202103004.htm QIAN H F, YI J P, FU Y H. Review of human action recognition based on deep learning[J]. Journal of Frontiers of Computer Science and Technology, 2021, 15(3): 438-455. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS202103004.htm
[2]	牛雨晴, 苏维均, 于重重, 等. 基于TX2环境的智能监控实时行为识别[J]. 信息技术与信息化, 2021(4): 243-245. doi: 10.3969/j.issn.1672-9528.2021.04.079
[3]	张庆宾, 丁娜娜, 吴海波. 基于BP神经网络的摔倒动作识别方法[J]. 指挥信息系统与技术, 2021, 12(1): 60-64. https://www.cnki.com.cn/Article/CJFDTOTAL-ZHXT202101011.htm ZHANG Q B, DING N N, WU H B. Fall recognition method based on BP neural network[J]. Command Information System and Technology, 2021, 12(1): 60-64. https://www.cnki.com.cn/Article/CJFDTOTAL-ZHXT202101011.htm
[4]	JOHANSSON G. Visual perception of biological motion and a model for its analysis[J]. Perception & Psycho-physics, 1973, 14(2): 201-211.
[5]	SHAHROUDY A, LIU J, NG T T, et al. NTU RGB+d: a large scale dataset for 3D human activity analysis[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Las Vegas: IEEE, 2016: 1010-1019.
[6]	ZHANG P, LAN C, XING J, et al. View adaptive neural networks for high performance Skeleton-based human action recognition[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 41(8): 1963-1978. doi: 10.1109/TPAMI.2019.2896631
[7]	DU Y, FU Y, WANG L. Skeleton based action recognition with convolutional neural network[C]//Proceedings of the 3rd IAPR Asian Conference on Pattern Recognition. Kuala Lumpur: IEEE, 2015: 579-583.
[8]	LI C, ZHONG Q, XIE D, et al. Co-occurrence feature learning from Skeleton data for action recognition and detection with hierarchical aggregation[C]//Proceedings of the 27th International Joint Conference on Artificial Intelligence. Stockholm: IJCAI, 2018: 786-792.
[9]	SU H, CHANG Z, YU M, et al. Convolutional neural network with adaptive inferential framework for Skeleton-based action recognition[J]. Journal of Visual Communication and Image Representation, 2020, 73: 102925/1-8.
[10]	YAN S, XIONG Y, LIN D. Spatial temporal graph convolutional networks for Skeleton-based action recognition[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New Orleans: AAAI, 2018: 7444-7452.
[11]	LI M, CHEN S, CHEN X, et al. Actional-structural graph convolutional networks for Skeleton-based action recognition[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 3595-3603.
[12]	CHENG K, ZHANG Y, HE X, et al. Skeleton-based action recognition with shift graph convolutional network[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Seattle: IEEE, 2020: 183-192.
[13]	LIU J, SHAHROUDY A, PEREZ M, et al. NTU RGB+d 120: a large-scale benchmark for 3D human activity understanding[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019, 42(10): 2684-2701.
[14]	WANG H, WANG L. Modeling temporal dynamics and spatial configurations of actions using two-stream recurrent neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 499-508.
[15]	SONG S, LAN C, XING J, et al. An end-to-end spatio-temporal attention model for human action recognition from Skeleton data[C]//Proceedings of the AAAI Conference on Artificial Intelligence. San Francisco: AAAI, 2017: 4263-4270.
[16]	LEE I, KIM D, KANG S, et al. Ensemble deep learning for Skeleton-based action recognition using temporal sliding LSTM networks[C]//Proceedings of the IEEE International Conference on Computer Vision. Venice: IEEE, 2017: 1012-1020.
[17]	ZHAO R, WANG K, SU H, et al. Bayesian graph convolution LSTM for Skeleton based action recognition[C]//Proceedings of the IEEE International Conference on Computer Vision. Seoul: IEEE, 2019: 6882-6892.
[18]	SHI L, ZHANG Y, CHENG J, et al. Two-stream adaptive graph convolutional networks for Skeleton-based action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 12026-12035.
[19]	SHI L, ZHANG Y, CHENG J, et al. Skeleton-based action recognition with directed graph neural networks[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 7912-7921.
[20]	PENG W, HONG X, CHEN H, et al. Learning graph con-volutional network for Skeleton-based human action recognition by neural searching[C]//Proceedings of the AAAI Conference on Artificial Intelligence. New York: AAAI, 2020: 2669-2676.
[21]	YANG D, LI M M, FU H, et al. Centrality graph convolutional networks for Skeleton-based action recognition[J/OL]. (2020-03-06)[2020-10-15]. arXiv. http://doi.org/10.48550/arXiv.2003.03007.
[22]	CHENG K, ZHANG Y, CAO C, et al. Decoupling GCN with dropgraph module for Skeleton-based action recognition[C]//Decoupling GCN with dropgraph module for Skeleton-based action recognition. Glasgow: Springer, 2020: 536-553.
[23]	KIM T S, REITER A. Interpretable 3D human action analysis with temporal convolutional networks[C]// Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops. Honolulu: IEEE, 2017: 1623-1631.
[24]	KE Q, BENNAMOUN M, AN S, et al. A new representation of Skeleton sequences for 3D action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 3288-3297.
[25]	SI C, JING Y, WANG W, et al. Skeleton-based action recognition with spatial reasoning and temporal stack learning[C]//Proceedings of the European Conference on Computer Vision. Munich: Springer, 2018: 103-118.
[26]	LIU J, SHAHROUDY A, XU D, et al. Spatio-temporal LSTM with trust gates for 3D human action recognition[C]//Proceedings of the European Conference on Computer Vision. Amsterdam: Springer, 2016: 816-833.
[27]	LIU J, GANG W, PING H, et al. Global context-aware attention LSTM networks for 3D action recognition[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Honolulu: IEEE, 2017: 1647-1656.
[28]	KE Q, BENNAMOUN M, AN S, et al. Learning clip representations for Skeleton-based 3D action recognition[J]. IEEE Transactions on Image Processing. Piscataway: IEEE, 2018: 27(6): 2842-2855.
[29]	LIAO S, LYONS T, YANG W, et al. Learning stochastic differential equations using RNN with log signature features[J/OL]. (2019-08-22)[ 2021-10-15]. arXiv. https://doi.org/10.48550/arXiv.1908.08286.
[30]	MEMMESHEIMER R, THEISEN N, PAULUS D. Gimme signals: discriminative signal encoding for multimodal activity recognition[C]//Proceedings of the IEEE International Conference on Intelligent Robots and Systems(IROS). Las Vegas: IEEE, 2020: 10394-10401.
[31]	PAPADOPOULOS K, GHORBEL E, AOUAD D, et al. Vertex feature encoding and hierarchical temporal modeling in a spatio-temporal graph convolutional network for action recognition[C]//Proceedings of the 25th International Conference on Pattern Recognition. Milan: IEEE, 2021: 452-458.
[32]	CAETANO C, SENA J, FRANCOIS B, et al. Skele Motion: a new representation of Skeleton Joint sequences based on motion information for 3D action recognition[C]//Proceedings of the International Conference on Advanced Video and Signal-based Surveillance(AVSS). Taipei, China: IEEE, 2019: 1-8.
[33]	CAETANO C, BREMOND F, SCHWARTZ W R. Skeletonimage representation for 3D action recognition based on tree structure and reference joints[C]//Proceedings of the 32nd SIBGRAPI Conference on Graphics, Patterns and Images, 2019 32nd SIBGRAPI. Janeiro: IEEE, 2019: 16-23.

Cited By

Cited by

Periodical cited type(4)

1.	苏晨阳，武文红，牛恒茂，石宝，郝旭，王嘉敏，高勒，汪维泰. 深度学习的工人多种不安全行为识别方法综述. 计算机工程与应用. 2024(05): 30-46 .
2.	许国敏，李伟，焦育威，杨洋，符刘旭，杨兴涛，刘野. 高速公路施工现场工人危险行为识别研究. 电脑知识与技术. 2024(06): 89-91 .
3.	王彩玲，闫晶晶，张智栋. 基于多模态数据的人体行为识别方法研究综述. 计算机工程与应用. 2024(09): 1-18 .
4.	谭巨全，王然. 特征融合下田径录像3D人体动作DTW捕捉算法. 计算机工程. 2024(07): 71-78 .

Other cited types(4)

Get Citation

PDF

XML

Article views (330) PDF downloads (90) Cited by(8)

Turn off MathJax

Article Contents

Abstract

References

3D Skeleton-based Human Action Recognition Based on Multi-stream Fusion Network

Abstract

References

Cited by

Periodical cited type(4)

Other cited types(4)

Catalog

Export File

Citation

Format

Content