基于多模态特征融合网络的专递课堂教学行为识别模型

薛增灿; 熊芳庆; 郑佳佳; 王均毅; 林桐; 穆肃

doi:10.6054/j.jscnun.2026003

基于多模态特征融合网络的专递课堂教学行为识别模型

Teaching Behavior Recognition Model for Special Delivery Classroom Based on Multimodal Feature Fusion Network

摘要

摘要: 为提升专递课堂教学质量评估的客观性与准确性，提出基于多模态特征融合网络的教师教学行为识别模型(MFFN)。针对专递课堂中教学行为隐性化、类间区分度低的问题，MFFN融合文本、语音与图像三模态信息，构建了隐性特征自适应网络(IFANet)以挖掘文本中的隐性行为线索，设计多维语音信息聚合模块(MVIA)以增强相似行为的声学区分能力，并采用改进的YOLOv11网络提取细粒度视觉特征。该研究构建了专递课堂教师教学行为数据集，并进行对比实验和消融实验。结果表明：MFFN模型的性能优越，在精确率、召回率、F1分数等关键指标上均优于现有基准模型。尤其在精确率、召回率与mAP@0.5三项指标上，MFFN相对最新基准模型分别提升4.8%、2.1%和2.5%，且较YOLOv11模型在mAP@0.5∶0.95上取得24.3%的显著性能提升。这为后续的专递课堂教师教学能力评价、教师专业素养提升等教育应用奠定了坚实基础。

Abstract: To enhance the objectivity and accuracy of teaching-quality assessment in special delivery classrooms, a Multimodal Feature Fusion Network (MFFN) is developed for teacher-behavior recognition. Implicit teaching behaviors and low inter-class discrimination are addressed by integrating textual, acoustic and visual cues: an Implicit-Feature Aggregation Network (IFANet) extracts latent behavioral evidence from instructional texts; a Multi-dimensional Voice Information Aggregation (MVIA) module strengthens acoustic distinction among similar behaviors; and an improved YOLOv11 network captures fine-grained visual features. A dedicated dataset of teaching behaviors collected from special delivery classrooms is constructed, and comprehensive comparative and ablation experiments are conducted. MFFN surpasses state-of-the-art baselines in precision, recall and F1-score, registering improvements of 4.8%, 2.1% and 2.5% in precision, recall and mAP@0.5, respectively, together with a 24.3% gain in mAP@0.5∶0.95 over the standard YOLOv11. The proposed framework provides a solid foundation for subsequent educational applications such as objective teacher-competence evaluation and professional development.

HTML全文

参考文献(20)

施引文献

资源附件(0)