基于生成式人工智能与双图解耦传播的智能组方模型

An Intelligent Formula Generation Model Based on Generative Artificial Intelligence and Dual-Graph Decoupled Propagation

  • 摘要: 中医药智能组方推荐系统对于辅助临床决策与传承创新具有重要意义,然而,由于传统中医药领域的古籍文献与临床医案等语料数据大多以非结构化知识的形式存在,语料数据质量参差不齐,因此中医药领域下的智能组方问题长期面临着高质量数据缺失、构建的结构化知识图谱稀疏以及深度语义信息利用不足等挑战。针对上述问题,提出了一种融合生成式大语言模型与双图分层解耦传播的智能组方模型(LLM-DualKG)。该模型包含2个核心模块:基于生成式人工智能(GenAI)的语义增强模块与基于分层门控融合的双图分层解耦传播模块。在语义增强模块中,利用收集到的中医药古籍语料,通过提示工程引导GenAI生成高质量微调数据集,并采用低秩自适应技术对DeepSeek-R1 (8 B)模型进行领域适配微调。而后为知识图谱节点生成结构化文本描述,并将其编码为语义向量,随后利用门控机制实现节点语义嵌入向量与节点拓扑嵌入向量的自适应融合,以缓解图结构稀疏性问题。在双图分层解耦传播模块中构建了“症状-草药”交互图与“草药-草药”知识图,并创新性地提出双图分层解耦传播方法,通过解耦双图的信息传播过程并采用分层门控融合机制,分别建模“症状-草药”间的临床关联与“草药-草药”间的理论配伍知识。最终,将得到的节点嵌入表征输入图卷积神经网络,以TOP-K的形式生成组方。在包含自建数据集在内的两个数据集上,与一系列基线模型进行对比实验,在召回率(Recall)、精确率(Precision)以及F1值(F1-score)等指标上均取得了较好结果。此外,对LLM-DualKG模型进行了消融实验,实验结果验证了模型各模块的有效性与必要性。本研究提出的智能组方方法为中医药领域知识计算与智能辅助诊疗提供了一个可解释、可扩展的端到端解决方案。

     

    Abstract: Intelligent prescription formulation recommendation systems in traditional Chinese medicine (TCM) play a vital role in assisting clinical decision-making and promoting heritage innovation. However, most corpus data in the traditional TCM domain (e.g., ancient literature and clinical medical records) exists in the form of unstructured knowledge, with significant variation in data quality. Consequently, the field of TCM has long faced challenges in intelligent prescription formulation, including a lack of high-quality datasets, the sparsity of constructed structured knowledge graphs, and insufficient utilization of deep semantic information. To address these issues, this paper proposes an intelligent prescription formulation model that integrates generative large language models (LLMs) with dual-graph hierarchical decoupled propagation. The model comprises two core modules: a generative artificial intelligence (GenAI)-based semantic enhancement module and a hierarchical gated fusion-based dual-graph hierarchical decoupling propagation module.In the semantic enhancement module, the research leverage collected corpora of ancient TCM literature to guide GenAI in generating high-quality fine-tuning datasets via prompt engineering, and employ Low-Rank Adaptation (LoRA) technology to fine-tune the DeepSeek-R1 (8 B) model for domain-specific adaptation. Similarly, through prompt engineering, the domain-fine-tuned language model is invoked to generate structured textual descriptions for knowledge graph nodes. After encoding these descriptions into semantic vectors, a gating mechanism is adopted to adaptively fuse node semantic embedding vectors with node topological embedding vectors, thereby mitigating the issue of graph sparsity. In the dual-graph hierarchical decoupling propagation module, theresearch constructed a "symptom-herb interaction graph" and a "herb-herb knowledge graph", and innovatively proposed a dual-graph hierarchical decoupling propagation method. By decoupling information propagation across the dual graphs and adopting a hierarchical gated fusion mechanism, theresearch separately modeled the clinical associations between "symptoms and herbs" and the theoretical compatibility knowledge of "herbs and herbs". Finally, the obtained node embeddings are input into a graph convolutional neural network (GCNN) to generate prescriptions in a top-K manner. This study conducts comparative experiments on two datasets (including a self-constructed dataset) against a series of baseline models. The results demonstrate superior performance in metrics such as Recall, Precision, and F1-score. To analyze the impact of each module on overall performance, ablation experiments are conducted, and the results validate the effectiveness and necessity of each module. The intelligent prescription formulation method proposed in this study provides an interpretable, scalable, and end-to-end solution for knowledge computation and intelligent auxiliary diagnosis and treatment in the field of TCM.

     

/

返回文章
返回