Citation: | LIU Shuang, QIAO Han, XU Qingzhen. The Cross-modal Retrieval Based on Batch Loss[J]. Journal of South China Normal University (Natural Science Edition), 2021, 53(6): 115-121. DOI: 10.6054/j.jscnun.2021101 |
[1] |
WANG B K, YANG Y, XU X. Adversarial cross-modal retrieval[C]//Proceedings of the 2017 ACM on Multimedia Conference. Mountain View: ACM, 2017: 154-162.
|
[2] |
HOTELLING H. Relations between two sets of variates[J]. Biometrika, 1935, 28: 321-377. http://www.onacademic.com/detail/journal_1000036334687710_32bd.html
|
[3] |
RUPNIK J, SHAWE-TAYLOR J. Multi-view canonical correlation analysis[C]//Proceedings of the Conference on Data Mining and Data Warehouses. [S. l. : s. n. ], 2010: 1-4.
|
[4] |
KAN M, SHAN S G, ZHANG H K, et al. Multi-view discriminant analysis[C]//Proceedings of the 12th European Conference on Computer Vision. Florence: Springer, 2012: 808-821.
|
[5] |
KAN M, SHAN S, ZHANG H K, et al. Multi-view discriminant analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2016, 38(1): 188-194. doi: 10.1109/TPAMI.2015.2435740
|
[6] |
ZHAI X H, PENG Y X, XIAO J G. Learning cross-media joint representation with sparse and semisupervised regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2014, 24(6): 965-978. doi: 10.1109/TCSVT.2013.2276704
|
[7] |
AKAHO S. A kernel method for canonical correlation analysis[C]//Proceedings of the International Meeting of Psychometric Society. [S. l. : s. n. ], 2001: 263-269.
|
[8] |
ANDREW G, ARORA R, BILMES J A, et al. Deep canonical correlation analysis[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta: PMLR, 2013: 1247-1255.
|
[9] |
WANG W R, ARORA R, LIVESCU K, et al. On deep multi-view representation learning[C]//Proceedings of the 32nd International Conference on Machine Learning. Lille: PMLR, 2015: 1083-1092.
|
[10] |
PENG Y X, HUANG X, QI J W. Cross-media shared representation by hierarchical learning with multiple deep networks[C]//Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence. New York: IJCAI, 2016: 3846-3853.
|
[11] |
PENG Y X, QI J W, HUANG X, et al. CCL: cross-modal correlation learning with multigrained fusion by hierarchical network[J]. IEEE Transactions on Multimedia, 2018, 20(2): 405-420. doi: 10.1109/TMM.2017.2742704
|
[12] |
ZHENG L, MA B P, LI G R, et al. Generalized semi-supervised and structured subspace learning for cross-modal retrieval[J]. IEEE Transactions on Multimedia, 2018, 20(1): 128-141. doi: 10.1109/TMM.2017.2723841
|
[13] |
PENG Y X, HUANG X, ZHAO Y Z. An overview of cross- media retrieval: concepts, methodologies, benchmarks and challenges[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2018, 28(9): 2372-2385. doi: 10.1109/TCSVT.2017.2705068
|
[14] |
PENG Y X, QI J W. CM-GANs: cross-modal generative adversarial networks for common representation[J]. ACM Transactions on Multimedia Computing Communications and Applications, 2019, 15(1): 1-24. http://arxiv.org/pdf/1710.05106
|
[15] |
MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[C]//Proceedings of the Advances in Neural Information Processing Systems. Lake Tahoe: [s. n. ], 2013: 3111-3119.
|
[16] |
KIM Y, MOSCHITTI A, PANG B, et al. Convolutional neural networks for sentence classification[C]//Procee-dings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014: 1746-1751.
|
[17] |
ZHEN L L, HU P, WANG X, et al. Deep supervised cross-modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 10394-10403.
|
[18] |
WANG X, HAN X T, HUANG W L, et al. Multi-similarity loss with general pair weighting for deep metric learning[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 5022-5030.
|
[19] |
ELEZI I, VASCON S, TORCINOVICH A, et al. The group loss for deep metric learning[C]//Proceedings of the 16th European Conference Computer Vision. Switzerland: Springer, 2020: 277-294.
|
[20] |
WEIBULL J W. Evolutionary game theory[M]. Massachusetts: MIT Press, 1997.
|
[21] |
ROSENFELD A, HUMMEL R A, ZUCKER S W. Scene labeling by relaxation operations[J]. IEEE Transactions on Systems, Man Cybernetics, 1976, 6(6): 420-433. http://ieeexplore.ieee.org/iel5/21/4309513/04309519.pdf
|
[22] |
PELILLO M. The dynamics of nonlinear relaxation labeling processes[J]. Journal of Mathematical Imaging and Vision, 1997, 7(4): 309-323. doi: 10.1023/A:1008255111261
|
[23] |
PEREIRA J C, COVIELLO E, DOYLE G, et al. On the role of correlation and abstraction in cross-modal multimedia retrieval[J]. IEEE Transations on Pattern Analysis and Machine Intelligence, 2014, 36(3): 521-535. doi: 10.1109/TPAMI.2013.142
|
[24] |
RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using Amazon's Mechanical Turk[C]//Proceedings of the 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Los Angeles: Association for Computational Linguistics, 2010: 139-147.
|
[25] |
CHUA T P, TANG J H, HONG R C, et al. NUS-WIDE: a real-world web image database from National University of Singapore[C]//Proceedings of the 8th ACM International Conference on Image and Video Retrieval. Santorini Island: ACM, 2009: 1-9.
|
[26] |
FENG F X, WANG X J, LI R F. Cross-modal retrieval with correspondence autoencoder[C]//Proceedings of the ACM International Conference on Multimedia. Orlando: ACM, 2014: 7-16.
|
[27] |
KINGMA D P, BA J. Adam: a method for stochastic optimization[J/OL]. arXiv, (2014-12-22)[2021-04-26]. https://arxiv.org/abs/1412.6980v8.
|
[28] |
BELLET A, HABRARD A, SEBBAN M. A survey on me-tric learning for feature vectors and structured data[J/OL]. arXiv, (2013-02-12)[2021-04-26]. http://arxiv.org/abs/1306.6709.
|
[29] |
MAATEN L V D, GEOFFREY H. Visualizing data using t-SNE[J]. Journal of Machine Learning Research, 2008, 9: 2579-2605. http://arxiv.org/abs/2108.01301v1
|
1. |
徐清振,肖彬. 公共空间共享参数的跨模态检索研究. 华南师范大学学报(自然科学版). 2023(01): 88-93 .
![]() |