• Overview of Chinese core journals
  • Chinese Science Citation Database(CSCD)
  • Chinese Scientific and Technological Paper and Citation Database (CSTPCD)
  • China National Knowledge Infrastructure(CNKI)
  • Chinese Science Abstracts Database(CSAD)
  • JST China
  • SCOPUS
XU Qingzhen, XIAO Bin. A Study of Shared Parameters Cross-modal Retrieval in Common Spaces[J]. Journal of South China Normal University (Natural Science Edition), 2023, 55(1): 88-93. DOI: 10.6054/j.jscnun.2023008
Citation: XU Qingzhen, XIAO Bin. A Study of Shared Parameters Cross-modal Retrieval in Common Spaces[J]. Journal of South China Normal University (Natural Science Edition), 2023, 55(1): 88-93. DOI: 10.6054/j.jscnun.2023008

A Study of Shared Parameters Cross-modal Retrieval in Common Spaces

More Information
  • Received Date: July 27, 2021
  • Available Online: April 11, 2023
  • To address the problem of large differences in data structures and characteristics of different modal data in cross-modal retrieval, the Shared Parameters Cross-modal Retrieval (SPCMR) based on the common space approach is proposed: first, the high-level semantic features of images and text are extracted by convolutional neural networks; then, mapping them to the common space through a fully connected layer and sharing part of the hidden layer weights of the 2 feature subnets; finally, discriminative training is performed by a linear classifier with label information. Experiments are conducted on the public dataset using the mean average precision (mAP) score as evaluation metrics, and the results show that the SPCMR can make full use of the semantic information of cross modal samples and effectively improve the accuracy of image and text retrieval.
  • [1]
    ZHANG L, WANG L, LIN W. Generalized biased discriminant analysis for content-based image retrieval[J]. IEEE Transactions on Systems, Man, and Cybernetics: Part B, 2011, 42(1): 282-290.
    [2]
    WANG B, YANG Y, XU X, et al. Adversarial cross-modal retrieval[C]//Proceedings of the 25th ACM International Conference on Multimedia, 2017: 154-162.
    [3]
    尹奇跃, 黄岩, 张俊格, 等. 基于深度学习的跨模态检索综述[J]. 中国图象图形学报, 2021, 26(6): 1368-1388. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202106007.htm

    YIN Q Y, HUANG Y, ZHANG J G, et al. Survey on deep learning based cross-modal retrieval[J]. Journal of Image And Graphigs, 2021, 26(6): 1368-1388. https://www.cnki.com.cn/Article/CJFDTOTAL-ZGTB202106007.htm
    [4]
    陈宁, 段友祥, 孙歧峰. 跨模态检索研究文献综述[J]. 计算机科学与探索, 2021, 15(8): 1390-1404. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS202108002.htm

    CHEN N, DUAN Y X, SUN Q F. Literature review of cross-modal retrieval research[J]. Journal of Frontiers of Computer Science & Technology, 2021, 15(8): 1390-1404. https://www.cnki.com.cn/Article/CJFDTOTAL-KXTS202108002.htm
    [5]
    HARDOON D R, SZEDMAK S, SHAWE-TAYLOR J. Canonical correlation analysis: an overview with application to learning methods[J]. Neural Computation, 2004, 16(12): 2639-2664. doi: 10.1162/0899766042321814
    [6]
    RASIWASIA N, MAHAJAN D, MAHADEVAN V, et al. Cluster canonical correlation analysis[C]//Artificial intelligence and statistics. Reykjavik: PMLR, 2014: 823-831.
    [7]
    ZHANG L, MA B, LI G, et al. Generalized semi-supervised and structured subspace learning for cross-modal retrieval[J]. IEEE Transactions on Multimedia, 2017, 20(1): 128-141.
    [8]
    LECUN Y, BENGIO Y, HINTON G. Deep learning[J]. Nature, 2015, 521: 436-444. doi: 10.1038/nature14539
    [9]
    ANDREW G, ARORA R, BILMES J, et al. Deep canonical correlation analysis[C]//Proceedings of the 30th International Conference on Machine Learning. Atlanta: PMLR, 2013: 1247-1255.
    [10]
    WANG W, YANG X, OOI B C, et al. Effective deep learning-based multi-modal retrieval[J]. The VLDB Journal, 2016, 25(1): 79-101. doi: 10.1007/s00778-015-0391-4
    [11]
    SIMONYAN K, ZISSERMAN A. Very deep convolutional networks for large-scale image recognition[J/OL]. (2017-01-30)[2021-07-01]. arXiv. https://arxiv.org/abs/1409.1556.
    [12]
    NAIR V, HINTON G E. Rectified linear units improve restricted boltzmann machines[C]//Proceedings of the 27th International Conference on Machine Learning. Haifa: ACM, 2010: 807-814.
    [13]
    FAN E. Extended tanh-function method and its applications to nonlinear equations[J]. Physics Letters A, 2000, 277(4/5): 212-218.
    [14]
    MIKOLOV T, SUTSKEVER I, CHEN K, et al. Distributed representations of words and phrases and their compositionality[J]//Advances in Neural Information Processing Systems, 2013, 26: 3111-3119.
    [15]
    KIM Y, MOSCHITTI A, PANG B, et al. Convolutional neural networks for sentence classification[C]//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing. Doha: ACL, 2014: 1746-1751.
    [16]
    ZHEN L L, HU P, WANG X, et al. Deep supervised cross- modal retrieval[C]//Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. Long Beach: IEEE, 2019: 10394-10403.
    [17]
    RASHTCHIAN C, YOUNG P, HODOSH M, et al. Collecting image annotations using amazon's mechanical turk[C]//Proceedings of the NAACL HLT 2010 Workshop on Creating Speech and Language Data with Amazon's Mechanical Turk. Los Angeles: NAACL, 2010: 139-147.
    [18]
    LIU W, MU C, KUMAR S, et al. Discrete graph hashing[J]. Advances in Neural Information Processing Systems, 2014, 27: 3419-3427.
    [19]
    RUPNIK J, SHAWE-TAYLOR J. Multi-view canonical correlation analysis[C]//Proceedings of the Conference on Data Mining and Data Warehouses. [S. l: s. n], 2010: 1-4.
    [20]
    KAN M, SHAN S, ZHANG H, et al. Multi-view discriminant analysis[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2015, 38(1): 188-194.
    [21]
    ZHAI X, PENG Y, XIAO J. Learning cross-media joint representation with sparse and semisupervised regularization[J]. IEEE Transactions on Circuits and Systems for Video Technology, 2013, 24(6): 965-978.
    [22]
    WANG B K, YANG Y, XU X. Adversarial cross-modal retrieval[C]//Proceedings of the 2017 ACM on Multimedia Conference. Mountain View: ACM, 2017: 154-162.
    [23]
    刘爽, 乔晗, 徐清振. 基于批损失的跨模态检索[J]. 华南师范大学学报(自然科学版), 2021, 53(6): 115-121. doi: 10.6054/j.jscnun.2021101

    LIU S, QIAO H, XU Q Z. The batch loss for cross-modal retrieval[J]. Journal of South China Normal University(Natural Science Edition), 2021, 53(6): 115-121. doi: 10.6054/j.jscnun.2021101
    [24]
    KINGMA D P, BA J. Adam: a method for stochastic optimization[J/OL]. (2017-01-30)[2021-07-28]. arXiv. https://arxiv.org/abs/1412.6980.
  • Cited by

    Periodical cited type(3)

    1. 高云波,李翠然,王国荣,李茂青. 基于Dinkelbach-Quadratic算法的高速铁路通信能效优化功率分配研究. 铁道学报. 2024(09): 69-75 .
    2. 陈家璘,孙志峰,曾铮,隋璐捷,汪龙志. 短距离D2D无线通信最优功率分配方法研究. 自动化与仪器仪表. 2022(10): 46-50 .
    3. 汪琰. 面向5G的多通道联合收发技术应用探讨. 电子制作. 2021(08): 67-69 .

    Other cited types(2)

Catalog

    Article views PDF downloads Cited by(5)

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return