Abstract:
Aiming at the problem that the method of couplet or triplet samples in cross-modal retrieval constructs redundant but uninformative sample pairs, a cross-modal retrieval method based on batch loss (BLCMR) is proposed. Firstly, the batch loss is introduced, and by taking into account the similarity of embedded samples, the invariance of cross-modal samples is effectively maintained. Secondly, an iterative method is introduced to modify the predicted category labels and effectively distinguish the semantic category information of the samples. Experimental results on three public datasets (Wikipedia, Pascal Sentence and NUS-WIDE-10k) show that the BLCMR method can effectively improve the accuracy of the final cross-modal retrieval.