Abstract:
To address the problem of large differences in data structures and characteristics of different modal data in cross-modal retrieval, the Shared Parameters Cross-modal Retrieval (SPCMR) based on the common space approach is proposed: first, the high-level semantic features of images and text are extracted by convolutional neural networks; then, mapping them to the common space through a fully connected layer and sharing part of the hidden layer weights of the 2 feature subnets; finally, discriminative training is performed by a linear classifier with label information. Experiments are conducted on the public dataset using the mean average precision (mAP) score as evaluation metrics, and the results show that the SPCMR can make full use of the semantic information of cross modal samples and effectively improve the accuracy of image and text retrieval.