Survey on Text Semantic Hashing Technology

SUN Yuqing; HUANG Tian; LI Chengtao; ZHENG Wei; TANG Yong

doi:10.6054/j.jscnun.2024041

SUN Yuqing, HUANG Tian, LI Chengtao, ZHENG Wei, TANG Yong. Survey on Text Semantic Hashing Technology[J]. Journal of South China Normal University (Natural Science Edition), 2024, 56(3): 93-105. DOI: 10.6054/j.jscnun.2024041

Citation:

Survey on Text Semantic Hashing Technology

Graphical Abstract

Abstract

Abstract

Text semantic hashing refers to the neural techniques that encode texts into low-dimensional binary codes under the semantic similarity constraints. Since the hashing codes support the Hamming distance-based retrieval, it is efficient to compute the text similarity on massive data. There are many challenges on the text semantic hashing technologies, such as how to embed the category information into low-dimensional binary codes, how to enrich the semantic information to improve model robustness and how to optimize the model for the discrete coding space. The important progresses on the text semantic hashing techniques are firstly reviewed, and the technical details of methods are discussed, including the unsupervised text semantic hashing models with text reconstruction and the supervised text semantic hashing models with integrating categorical information. Additionally, the key techniques such as semantic enhancement techniques based on neighbor information and latent topic information and model optimization techniques are analyzed. The datasets on text semantic hashing and the evaluation metrics related to the text semantic hashing task are also summarized, based on which the performances of different text semantic hashing methods are compared. Finally, the future research directions are discussed.

FullText(HTML)

References (50)

Cited By

Turn off MathJax

Article Contents

Survey on Text Semantic Hashing Technology

Abstract

Catalog

Export File

Citation

Format

Content