The Cross-modal Retrieval Based on Batch Loss

LIU Shuang; QIAO Han; XU Qingzhen

doi:10.6054/j.jscnun.2021101

Journal of South China Normal University (Natural Science Edition) > 2021 > 53(6): 115-121. > DOI: 10.6054/j.jscnun.2021101

LIU Shuang, QIAO Han, XU Qingzhen. The Cross-modal Retrieval Based on Batch Loss[J]. Journal of South China Normal University (Natural Science Edition), 2021, 53(6): 115-121. DOI: 10.6054/j.jscnun.2021101

Citation:

PDF (1380 KB)

The Cross-modal Retrieval Based on Batch Loss

School of Computer Science, South China Normal University, Guangzhou 510631, China

More Information

Received Date: May 01, 2021
Available Online: January 09, 2022

Graphical Abstract

Abstract

Abstract

Aiming at the problem that the method of couplet or triplet samples in cross-modal retrieval constructs redundant but uninformative sample pairs, a cross-modal retrieval method based on batch loss (BLCMR) is proposed. Firstly, the batch loss is introduced, and by taking into account the similarity of embedded samples, the invariance of cross-modal samples is effectively maintained. Secondly, an iterative method is introduced to modify the predicted category labels and effectively distinguish the semantic category information of the samples. Experimental results on three public datasets (Wikipedia, Pascal Sentence and NUS-WIDE-10k) show that the BLCMR method can effectively improve the accuracy of the final cross-modal retrieval.
- cross-modal retrieval,
- batch loss,
- an iterative approach

FullText(HTML)

References (29)

References

[1]	WANG B K, YANG Y, XU X. Adversarial cross-modal retrieval[C]//Proceedings of the 2017 ACM on Multimedia Conference. Mountain View: ACM, 2017: 154-162.
[2]	HOTELLING H. Relations between two sets of variates[J]. Biometrika, 1935, 28: 321-377. http://www.onacademic.com/detail/journal_1000036334687710_32bd.html
[3]	RUPNIK J, SHAWE-TAYLOR J. Multi-view canonical correlation analysis[C]//Proceedings of the Conference on Data Mining and Data Warehouses. [S. l. : s. n. ], 2010: 1-4.
[4]	KAN M, SHAN S G, ZHANG H K, et al. Multi-view discriminant analysis[C]//Proceedings of the 12th European Conference on Computer Vision. Florence: Springer, 2012: 808-821.
[5]	KAN M, SHAN S, ZHANG H K, et al. Multi-view discriminant analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1): 188-194. doi: 10.1109/TPAMI.2015.2435740
[6]	ZHAI X H, PENG Y X, XIAO J G. Learning cross-media joint representation with sparse and semisupervised regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6): 965-978. doi: 10.1109/TCSVT.2013.2276704
[7]	AKAHO S. A kernel method for canonical correlation analysis[C]//Proceedings of the International Meeting of Psychometric Society. [S. l. : s. n. ], 2001: 263-269.
[8]	ANDREW G, ARORA R, BILMES J A, et al. Deep canonical correlation analysis[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta: PMLR, 2013: 1247-1255.
[9]	WANG W R, ARORA R, LIVESCU K, et al. On deep multi-view representation learning[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille: PMLR, 2015: 1083-1092.
[10]	PENG Y X, HUANG X, QI J W. Cross-media shared representation by hierarchical learning with multiple deep networks[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York: IJCAI, 2016: 3846-3853.
[11]	PENG Y X, QI J W, HUANG X, et al. CCL: cross-modal correlation learning with multigrained fusion by hierarchical network[J]. IEEE Transactions on Multimedia, 2018, 20(2): 405-420. doi: 10.1109/TMM.2017.2742704
[12]	ZHENG L, MA B P, LI G R, et al. Generalized semi-supervised and structured subspace learning for cross-modal retrieval[J]. IEEE Transactions on Multimedia, 2018, 20(1): 128-141. doi: 10.1109/TMM.2017.2723841
[13]	PENG Y X, HUANG X, ZHAO Y Z. An overview of cross- media retrieval: concepts, methodologies, benchmarks and challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385. doi: 10.1109/TCSVT.2017.2705068
[14]	PENG Y X, QI J W. CM-GANs: cross-modal generative adversarial networks for common representation[J]. ACM Transactions on Multimedia Computing Communications and Applications, 2019, 15(1): 1-24. http://arxiv.org/pdf/1710.05106
[15]	MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in Neural Information Processing Systems. Lake Tahoe: [s. n. ], 2013: 3111-3119.
[16]	KIM Y, MOSCHITTI A, PANG B, et al. Convolutional neural networks for sentence classification[C]//Procee-dings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014: 1746-1751.
[17]	ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 10394-10403.
[18]	WANG X, HAN X T, HUANG W L, et al. Multi-similarity loss with general pair weighting for deep metric learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5022-5030.
[19]	ELEZI I, VASCON S, TORCINOVICH A, et al. The group loss for deep metric learning[C]//Proceedings of the 16th European Conference Computer Vision. Switzerland: Springer, 2020: 277-294.
[20]	WEIBULL J W. Evolutionary game theory[M]. Massachusetts: MIT Press, 1997.
[21]	ROSENFELD A, HUMMEL R A, ZUCKER S W. Scene labeling by relaxation operations[J]. IEEE Transactions on Systems, Man Cybernetics, 1976, 6(6): 420-433. http://ieeexplore.ieee.org/iel5/21/4309513/04309519.pdf
[22]	PELILLO M. The dynamics of nonlinear relaxation labeling processes[J]. Journal of Mathematical Imaging and Vision, 1997, 7(4): 309-323. doi: 10.1023/A:1008255111261
[23]	PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transations on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535. doi: 10.1109/TPAMI.2013.142
[24]	RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazon's Mechanical Turk[C]//Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Los Angeles: Association for Computational Linguistics, 2010: 139-147.
[25]	CHUA T P, TANG J H, HONG R C, et al. NUS-WIDE: a real-world web image database from National University of Singapore[C]//Proceedings of the 8th ACM International Conference on Image and Video Retrieval. Santorini Island: ACM, 2009: 1-9.
[26]	FENG F X, WANG X J, LI R F. Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the ACM International Conference on Multimedia. Orlando: ACM, 2014: 7-16.
[27]	KINGMA D P, BA J. Adam: a method for stochastic optimization[J/OL]. arXiv, (2014-12-22)[2021-04-26]. https://arxiv.org/abs/1412.6980v8.
[28]	BELLET A, HABRARD A, SEBBAN M. A survey on me-tric learning for feature vectors and structured data[J/OL]. arXiv, (2013-02-12)[2021-04-26]. http://arxiv.org/abs/1306.6709.
[29]	MAATEN L V D, GEOFFREY H. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. http://arxiv.org/abs/2108.01301v1

Cited By

Cited by

Periodical cited type(1)

徐清振，肖彬. 公共空间共享参数的跨模态检索研究. 华南师范大学学报(自然科学版). 2023(01): 88-93 .

Other cited types(1)

Get Citation

PDF

XML

Article views (686) PDF downloads (139) Cited by(2)

Turn off MathJax

Article Contents

Abstract

References

The Cross-modal Retrieval Based on Batch Loss

Abstract

References

Cited by

Periodical cited type(1)

Other cited types(1)

Catalog

Export File

Citation

Format

Content