基于词向量的学术语义搜索研究

Research on Academic Semantic Search Using Word Vector Representations

  • 摘要: 基于学者网提供的计算机专业论文语料库,利用Glove语义分析工具,给出了多种词向量训练方案,比较了各自的优劣性;提出了利用随机映射的方法,在大规模的向量空间中快速定位向量;最后提出了在单个词的语义向量基础上计算整篇学术文档的语义向量的方案.通过一系列实验验证了基于词向量的学术语义搜索方案的有效性,并实际应用于学者网学术搜索中,取得良好的效果.

     

    Abstract: Using the papers in computer science extracted from Scholat as the corpus, multiple word vector training schemes are proposed using the Glove semantic toolkit, and their performances are compared and analyzed. Then, a random projection method is proposed to quickly access vectors in the large vector space. Finally, a semantic vector computing scheme for the whole academic documents is proposed based on the word vector representations. A series of experiments are conducted, and the effectiveness of the proposed scheme word vector based academic semantic search is verified. This scheme is applied to the search function of Scholat and it can obtain satisfying performance.

     

/

返回文章
返回