肖宝, 韦丽娜, 李璞, 蒋运承. 融合文本和路径语义的知识图谱嵌入学习模型[J]. 华南师范大学学报(自然科学版), 2020, 52(6): 103-112. doi: 10.6054/j.jscnun.2020101
引用本文: 肖宝, 韦丽娜, 李璞, 蒋运承. 融合文本和路径语义的知识图谱嵌入学习模型[J]. 华南师范大学学报(自然科学版), 2020, 52(6): 103-112. doi: 10.6054/j.jscnun.2020101
XIAO Bao, WEI Lina, LI Pu, JIANG Yuncheng. The Model of Knowledge Graph Embedding with Text and Relation Path[J]. Journal of South China Normal University (Natural Science Edition), 2020, 52(6): 103-112. doi: 10.6054/j.jscnun.2020101
Citation: XIAO Bao, WEI Lina, LI Pu, JIANG Yuncheng. The Model of Knowledge Graph Embedding with Text and Relation Path[J]. Journal of South China Normal University (Natural Science Edition), 2020, 52(6): 103-112. doi: 10.6054/j.jscnun.2020101

融合文本和路径语义的知识图谱嵌入学习模型

The Model of Knowledge Graph Embedding with Text and Relation Path

  • 摘要: 针对现有的融合文本和路径信息的模型未能充分挖掘和利用文本与路径语义的问题,提出了新的知识图谱嵌入学习模型(GETR模型):首先, 利用LDA丰富实体描述文本语义并用TWE获取词和主题向量,采用Bi-LSTM模型把词和主题向量编码融入实体向量表示中,以增强结点的语义表达能力;其次,设计了以组合PageRank和余弦相似度算法为策略的随机游走算法,以获取实体间的多步路径,并利用自注意力机制捕获路径的重要语义融入到翻译模型中进行联合训练,从而达到有效过滤路径中的噪声和提高模型效率的目的.最后,在数据集FB15K、FB20K和WN18上,对GETR、TransE、DKRL、TKGE模型进行知识补全和实体分类任务的评测,结果表明:GETR模型具有更好的性能表现,是一种更加高效的知识表示方法.

     

    Abstract: Considering that the existing models cannot completely take advantage of the semantic information of texts and paths, a new model of knowledge graph embedding (named GETR model) is proposed. First, LDA is used to enrich the semantics of an entity description text and TWE is used to obtain word embedding and topic embedding. To enhance the representation of entities, the modified Bi-LSTM model is exploited to encode word embedding and topic embedding. Furthermore, the multiple-step path between two entities is obtained through random walks with the strategy of combining PageRank and Cosine similarity. Additionally, to filter the noise and improve the efficiency of the model, the important semantics of the multi-step path to be used for joint training with the translation model is captured with the self-attention mechanism. Finally, the proposed model GETR, as well as the baseline models TransE, DKRL and TKGE, is evaluated in the tasks of knowledge graph completion and entity classification with three datasets: FB15K, FB20K, and WN18. Experimental results demonstrate that the proposed model outperforms the baseline models, indicating that the new model is more effective for knowledge representation.

     

/

返回文章
返回