Associating Images with Sentences Using Recurrent Canonical Correlation Analysis
Abstract
:1. Introduction
2. Related Work
2.1. Association between Images and Sentences
2.2. Attention-Based Models
3. Association between Images and Sentences via Recurrent Canonical Correlation Analysis
3.1. Image Representation Using Contextual Attention-Based LSTM-RNN
3.2. Sentence Representation Using Conventional LSTM-RNN
3.3. Model Learning with Canonical Correlation Analysis
4. Experimental Results
4.1. Datasets and Protocols
4.2. Experimental Details
4.3. Image and Sentence Matching
4.4. Analysis of the Number of Time Steps
4.5. Visualization of Dynamical Attention Maps
4.6. Error Analysis
5. Conclusions and Future Work
Author Contributions
Funding
Conflicts of Interest
References
- Frome, A.; Corrado, G.S.; Shlens, J.; Bengio, S.; Dean, J.; Ranzato, M.A.; Mikolov, T. Devise: A deep visual-semantic embedding model. In Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Lake Tahoe, NV, USA, 2013. [Google Scholar]
- Karpathy, A.; Joulin, A.; Li, F.F. Deep fragment embeddings for bidirectional image sentence mapping. In Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Montreal, QC, Canada, 2014. [Google Scholar]
- Mikolajczyk, F.Y.K. Deep Correlation for Matching Images and Text. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Klein, B.; Lev, G.; Sadeh, G.; Wolf, L. Associating Neural Word Embeddings with Deep Image Representations using Fisher Vectors. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Socher, R.; Karpathy, A.; Le, Q.V.; Manning, C.D.; Ng, A.Y. Grounded compositional semantics for finding and describing images with sentences. In Transactions of the Association for Computational Linguistics (TACL); MIT Press: Cambridge, MA, USA, 2014. [Google Scholar]
- Chen, X.; Zitnick, C.L. Learning a recurrent visual representation for image caption generation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Kiros, R.; Salakhutdinov, R.; Zemel, R.S. Unifying visual-semantic embeddings with multimodal neural language models. arXiv 2014, arXiv:1411.2539. [Google Scholar]
- Ma, L.; Lu, Z.; Shang, L.; Li, H. Multimodal Convolutional Neural Networks for Matching Image and Sentence. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Plummer, B.; Wang, L.; Cervantes, C.; Caicedo, J.; Hockenmaier, J.; Lazebnik, S. Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Karpathy, A.; Li, F.F. Deep visual-semantic alignments for generating image descriptions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Santiago, Chile, 7–13 December 2015. [Google Scholar]
- Rensink, R.A. The dynamic representation of scenes. Vis. Cogn. 2000, 7, 17–42. [Google Scholar] [CrossRef]
- Gregor, K.; Danihelka, I.; Graves, A.; Wierstra, D. DRAW: A recurrent neural network for image generation. arXiv 2015, arXiv:1502.04623. [Google Scholar]
- Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Sharma, S.; Kiros, R.; Salakhutdinov, R. Action Recognition using Visual Attention. arXiv 2015, arXiv:1511.04119. [Google Scholar]
- Xu, K.; Ba, J.; Kiros, R.; Courville, A.; Salakhutdinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. arXiv 2015, arXiv:1502.03044. [Google Scholar]
- Albright, T.D.; Stoner, G.R. Contextual influences on visual processing. Ann. Rev. Neurosci. 2002, 25, 339–379. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Ba, J.; Mnih, V.; Kavukcuoglu, K. Multiple object recognition with visual attention. arXiv 2014, arXiv:1412.7755. [Google Scholar]
- Wang, W.; Chen, C.; Wang, Y.; Jiang, T.; Fang, F.; Yao, Y. Simulating human saccadic scanpaths on natural images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Colorado Springs, CO, USA, 20–25 June 2011. [Google Scholar]
- Jo, Y.; Wi, J.; Kim, M.; Lee, J.Y. Flexible Fashion Product Retrieval Using Multimodality-Based Deep Learning. Appl. Sci. 2020, 10, 1569. [Google Scholar] [CrossRef] [Green Version]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Advances in Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Lake Tahoe, NV, USA, 2012, pp. 1106–1114. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Perronnin, F.; Dance, C. Fisher kernels on visual vocabularies for image categorization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Minneapolis, MN, USA, 17–22 June 2007. [Google Scholar]
- Huang, Y.; Wang, W.; Wang, L. Instance-aware image and sentence matching with selective multimodal lstm. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 2310–2318. [Google Scholar]
- Huang, Y.; Wu, Q.; Wang, W.; Wang, L. Image and sentence matching via semantic concepts and order learning. IEEE Trans. Pattern Anal. Mach. Intell. (TPAMI) 2020, 42, 636–650. [Google Scholar] [CrossRef] [PubMed]
- Li, K.; Zhang, Y.; Li, K.; Li, Y.; Fu, Y. Visual semantic reasoning for image-text matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 4654–4662. [Google Scholar]
- Nguyen, D.K.; Okatani, T. Multi-task learning of hierarchical vision-language representation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 10492–10501. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Anderson, P.; He, X.; Buehler, C.; Teney, D.; Johnson, M.; Gould, S.; Zhang, L. Bottom-up and top-down attention for image captioning and visual question answering. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 6077–6086. [Google Scholar]
- Graves, A. Generating sequences with recurrent neural networks. arXiv 2013, arXiv:1308.0850. [Google Scholar]
- Larochelle, H.; Hinton, G.E. Learning to combine foveal glimpses with a third-order Boltzmann machine. In Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Vancouver, BC, Canada, 2010. [Google Scholar]
- Mnih, V.; Heess, N.; Graves, A. Recurrent models of visual attention. In Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Montreal, QC, Canada, 2014. [Google Scholar]
- Hu, X.; Yang, K.; Fei, L.; Wang, K. Acnet: Attention based network to exploit complementary features for rgbd semantic segmentation. In Proceedings of the IEEE International Conference on Image Processing (ICIP), Chicago, IL, USA, 22–25 September 2019; pp. 1440–1444. [Google Scholar]
- Huang, Z.; Wang, X.; Huang, L.; Huang, C.; Wei, Y.; Liu, W. Ccnet: Criss-cross attention for semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 603–612. [Google Scholar]
- Yang, K.; Hu, X.; Chen, H.; Xiang, K.; Wang, K.; Stiefelhagen, R. Ds-pass: Detail-sensitive panoramic annular semantic segmentation through swaftnet for surrounding sensing. arXiv 2019, arXiv:1909.07721. [Google Scholar]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef] [Green Version]
- Arevalo, J.; Solorio, T.; Montes-y Gómez, M.; González, F.A. Gated multimodal units for information fusion. arXiv 2017, arXiv:1702.01992. [Google Scholar]
- Zaremba, W.; Sutskever, I.; Vinyals, O. Recurrent neural network regularization. arXiv 2014, arXiv:1409.2329. [Google Scholar]
- Davis, S.B.; Mermelstein, P. Comparison of parametric representations for monosyllabic word recognition in continuously spoken sentences. IEEE Trans. Acoust. Speech Signal Process. (TASSP) 1980, 28, 357–366. [Google Scholar] [CrossRef] [Green Version]
- Andrew, G.; Arora, R.; Bilmes, J.; Livescu, K. Deep canonical correlation analysis. In Proceedings of the International Conference on Machine Learning (ICML), Atlanta, GA, USA, 16–21 June 2013. [Google Scholar]
- Yan, F.; Mikolajczyk, K. Deep correlation for matching images and text. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015. [Google Scholar]
- Hodosh, M.; Young, P.; Hockenmaier, J. Framing image description as a ranking task: Data, models and evaluation metrics. J. Artif. Intell. Res. 2013, 47, 853–899. [Google Scholar] [CrossRef] [Green Version]
- Young, P.; Lai, A.; Hodosh, M.; Hockenmaier, J. From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Trans. Assoc. Comput. Linguist. (TACL) 2014, 2, 67–78. [Google Scholar] [CrossRef]
- Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft coco: Common objects in context. In European Conference on Computer Vision (ECCV); Sprinder: Zurich, Switzerland, 2014. [Google Scholar]
- Mao, J.; Xu, W.; Yang, Y.; Wang, J.; Yuille, A.L. Explain images with multimodal recurrent neural networks. In Proceedings of the International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
- Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. Pytorch: An imperative style, high-performance deep learning library. In Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Vancouver, BC, Canada, 2019; pp. 8026–8037. [Google Scholar]
- Vinyals, O.; Toshev, A.; Bengio, S.; Erhan, D. Show and tell: A neural image caption generator. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Santiago, Chile, 24–27 June 2014. [Google Scholar]
- Donahue, J.; Hendricks, L.A.; Guadarrama, S.; Rohrbach, M.; Venugopalan, S.; Saenko, K.; Darrell, T. Long-term recurrent convolutional networks for visual recognition and description. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Columbus, OH, USA, 24–27 June 2014. [Google Scholar]
- Kiros, R.; Zhu, Y.; Salakhutdinov, R.R.; Zemel, R.; Urtasun, R.; Torralba, A.; Fidler, S. Skip-thought vectors. In Neural Information Processing Systems (NeurIPS); NeurIPS Foundation: Montreal, QC, Canada, 2015. [Google Scholar]
- Huang, Y.; Wang, L. Acmm: Aligned cross-modal memory for few-shot image and sentence matching. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Seoul, Korea, 27 October–2 November 2019; pp. 5774–5783. [Google Scholar]
Method | Image Annotation | Image Retrieval | Sum | ||||||
---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | Med r | R@1 | R@5 | R@10 | Med r | ||
DeViSE [1] | 4.8 | 16.5 | 27.3 | 28.0 | 5.9 | 20.1 | 29.6 | 29 | 104.2 |
SDT-RNN [5] | 6.0 | 22.7 | 34.0 | 23.0 | 6.6 | 21.6 | 31.7 | 25 | 122.6 |
Deep Fragment [2] | 12.6 | 32.9 | 44.0 | 14 | 9.7 | 29.6 | 42.5 | 15 | 171.3 |
RVP (T+I) [6] | 11.7 | 34.8 | 48.6 | 11.2 | 11.4 | 32.0 | 46.2 | 11 | 184.7 |
m-RNN [45] | 14.5 | 37.2 | 48.5 | 11 | 11.5 | 31.0 | 42.4 | 15 | 185.0 |
DCCA [3] | 17.9 | 40.3 | 51.9 | 9 | 12.7 | 31.2 | 44.1 | 13 | 197.9 |
DVSA (BRNN) [10] | 16.5 | 40.6 | 54.2 | 7.6 | 11.8 | 32.1 | 44.7 | 12.4 | 199.9 |
MNLM [7] | 18.0 | 40.9 | 55.0 | 8 | 12.5 | 37.0 | 51.5 | 10 | 214.9 |
NIC [47] | 20.0 | - | 61.0 | 6 | 19.0 | - | 64.0 | 5 | - |
m-CNN-st [8] | 18.1 | 44.1 | 57.9 | 7 | 14.6 | 38.5 | 53.5 | 9 | 226.7 |
[8] | 24.8 | 53.7 | 67.1 | 5 | 20.3 | 47.6 | 61.7 | 5 | 275.2 |
FV(HGLMM) [4] | 28.5 | 58.4 | 71.7 | 4 | 20.6 | 49.4 | 64.0 | 6 | 292.6 |
[4] | 31.0 | 59.3 | 73.7 | 4 | 21.3 | 50.0 | 64.8 | 5 | 300.1 |
Ours: | |||||||||
RCCA-nc | 19.5 | 44.8 | 58.2 | 7 | 14.7 | 39.9 | 53.0 | 9 | 230.1 |
RCCA-na | 22.8 | 48.9 | 63.3 | 6 | 16.6 | 41.1 | 54.0 | 9 | 246.7 |
RCCA-fc | 23.3 | 49.4 | 64.0 | 6 | 17.0 | 41.2 | 53.5 | 8 | 248.4 |
RCCA-lc | 25.6 | 51.9 | 65.4 | 5 | 17.7 | 41.2 | 53.7 | 9 | 255.5 |
RCCA-ac | 26.5 | 55.0 | 67.9 | 4 | 18.2 | 45.3 | 58.2 | 7 | 271.1 |
-f30k | 28.1 | 56.1 | 66.3 | 4 | 18.5 | 43.8 | 59.6 | 7 | 272.4 |
-coco | 29.0 | 57.7 | 68.2 | 4 | 19.7 | 45.3 | 61.1 | 6 | 281.0 |
30.3 | 59.6 | 69.7 | 3 | 20.6 | 47.9 | 62.1 | 6 | 290.2 |
Method | Image Annotation | Image Retrieval | Sum | ||||||
---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | Med r | R@1 | R@5 | R@10 | Med r | ||
DeViSE [1] | 4.5 | 18.1 | 29.2 | 26 | 6.7 | 21.9 | 32.7 | 25 | 113.1 |
SDT-RNN [5] | 9.6 | 29.8 | 41.1 | 16 | 8.9 | 29.8 | 41.1 | 16 | 160.3 |
RVP (T+I) [6] | 12.1 | 27.8 | 47.8 | 11 | 12.7 | 33.1 | 44.9 | 12.5 | 178.4 |
Deep Fragment [2] | 14.2 | 37.7 | 51.3 | 10 | 10.2 | 30.8 | 44.2 | 14 | 188.4 |
DCCA [3] | 16.7 | 39.3 | 52.9 | 8 | 12.6 | 31.0 | 43.0 | 15 | 195.5 |
NIC [47] | 17.0 | - | 56.0 | 7 | 17.0 | - | 57.0 | 7 | - |
DVSA (BRNN) [10] | 22.2 | 48.2 | 61.4 | 4.8 | 15.2 | 37.7 | 50.5 | 9.2 | 235.2 |
MNLM [7] | 23.0 | 50.7 | 62.9 | 5 | 16.8 | 42.0 | 56.5 | 8 | 251.9 |
LRCN [48] | - | - | - | - | 17.5 | 40.3 | 50.8 | 9 | - |
m-RNN [45] | 35.4 | 63.8 | 73.7 | 3 | 22.8 | 50.7 | 63.1 | 5 | 309.5 |
FV (HGLMM) [4] | 34.4 | 61.0 | 72.3 | 3 | 24.4 | 52.1 | 65.6 | 5 | 309.8 |
[4] | 35.0 | 62.0 | 73.8 | 3 | 25.0 | 52.7 | 66.0 | 5 | 314.5 |
m-CNN-st [8] | 27.0 | 56.4 | 70.1 | 4 | 19.7 | 48.4 | 62.3 | 6 | 283.9 |
[8] | 33.6 | 64.1 | 74.9 | 3 | 26.2 | 56.3 | 69.6 | 4 | 324.7 |
[9] | 37.4 | 63.1 | 74.3 | - | 26.0 | 56.0 | 69.3 | - | 326.1 |
Ours: | |||||||||
RCCA-nc | 27.5 | 53.3 | 66.9 | 5 | 20.9 | 46.7 | 58.8 | 7 | 274.1 |
RCCA-na | 34.4 | 61.6 | 71.2 | 3 | 23.9 | 51.1 | 61.8 | 5 | 304.0 |
RCCA-fc | 32.2 | 60.4 | 72.3 | 3 | 23.7 | 53.1 | 65.0 | 5 | 306.7 |
RCCA-lc | 32.2 | 61.2 | 72.4 | 3 | 23.9 | 53.4 | 65.8 | 5 | 308.9 |
RCCA-ac | 36.0 | 65.8 | 75.6 | 3 | 25.8 | 53.9 | 65.7 | 5 | 322.8 |
39.3 | 68.7 | 78.2 | 2 | 28.7 | 57.2 | 69.8 | 4 | 341.9 |
Method | Image Annotation | Image Retrieval | Sum | ||||||
---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | Med r | R@1 | R@5 | R@10 | Med r | ||
STD (bi-skip) [49] | 32.7 | 67.3 | 79.6 | 3 | 24.2 | 57.1 | 73.2 | 4 | 334.1 |
[49] | 33.8 | 67.7 | 82.1 | 3 | 25.9 | 60.0 | 74.6 | 4 | 344.1 |
m-RNN [45] | 41.0 | 73.0 | 83.5 | 2 | 29.0 | 42.2 | 77.0 | 3 | 345.7 |
FV (HGLMM) [4] | 37.7 | 66.6 | 79.1 | 3 | 24.9 | 58.8 | 76.5 | 4 | 343.6 |
[4] | 39.4 | 67.9 | 80.9 | 2 | 25.1 | 59.8 | 76.6 | 4 | 349.7 |
DVSA [10] | 38.4 | 69.9 | 80.5 | 1 | 27.4 | 60.2 | 74.8 | 3 | 351.2 |
MNLM [7] | 43.4 | 75.7 | 85.8 | 2 | 31.0 | 66.7 | 79.9 | 3 | 382.5 |
m-CNN-st [8] | 38.3 | 69.6 | 81.0 | 2 | 27.4 | 63.4 | 79.5 | 3 | 359.2 |
[8] | 42.8 | 73.1 | 84.1 | 2 | 32.6 | 68.6 | 82.8 | 3 | 384.0 |
Ours: | |||||||||
RCCA-nc | 37.4 | 70.3 | 81.5 | 2 | 29.7 | 65.3 | 79.8 | 3 | 364.0 |
RCCA-na | 40.7 | 71.0 | 84.6 | 2 | 32.9 | 68.8 | 81.0 | 3 | 379.0 |
RCCA-fc | 41.9 | 73.5 | 84.1 | 2 | 33.4 | 68.1 | 81.8 | 3 | 382.8 |
RCCA-lc | 42.3 | 77.8 | 87.9 | 2 | 34.3 | 68.4 | 81.0 | 3 | 391.7 |
RCCA-ac | 44.9 | 79.6 | 87.7 | 2 | 35.8 | 71.2 | 83.3 | 2 | 402.5 |
49.4 | 80.1 | 89.5 | 2 | 37.9 | 73.5 | 84.9 | 2 | 415.3 |
Method | Image Annotation | Image Retrieval | Sum | ||||||
---|---|---|---|---|---|---|---|---|---|
R@1 | R@5 | R@10 | Med r | R@1 | R@5 | R@10 | Med r | ||
RCCA-ac: | |||||||||
42.5 | 74.9 | 86.0 | 2 | 32.1 | 67.7 | 81.4 | 3 | 384.6 | |
44.9 | 79.6 | 87.7 | 2 | 35.8 | 71.2 | 83.3 | 2 | 402.5 | |
44.0 | 78.0 | 86.8 | 2 | 35.4 | 71.0 | 83.1 | 2 | 398.3 | |
44.1 | 76.9 | 86.7 | 2 | 35.5 | 71.1 | 83.2 | 2 | 397.5 | |
43.2 | 76.5 | 87.1 | 2 | 34.7 | 71.0 | 83.1 | 2 | 395.6 | |
RCCA-nc: | |||||||||
38.5 | 70.1 | 80.9 | 2 | 28.0 | 63.2 | 76.1 | 3 | 356.8 | |
37.4 | 70.3 | 81.5 | 2 | 29.7 | 65.3 | 79.8 | 3 | 364.0 | |
34.9 | 67.1 | 78.9 | 3 | 28.8 | 64.5 | 77.5 | 3 | 351.7 | |
36.2 | 65.6 | 77.9 | 3 | 28.9 | 64.0 | 77.2 | 3 | 350.9 | |
32.6 | 64.7 | 76.6 | 3 | 27.1 | 62.6 | 75.7 | 3 | 339.2 |
Input Image | RCCA-ac | RCCA-nc | ||||
---|---|---|---|---|---|---|
1st Step | 2nd Step | 3rd Step | 1st Step | 2nd Step | 3rd Step | |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Guo, Y.; Yuan, H.; Zhang, K. Associating Images with Sentences Using Recurrent Canonical Correlation Analysis. Appl. Sci. 2020, 10, 5516. https://doi.org/10.3390/app10165516
Guo Y, Yuan H, Zhang K. Associating Images with Sentences Using Recurrent Canonical Correlation Analysis. Applied Sciences. 2020; 10(16):5516. https://doi.org/10.3390/app10165516
Chicago/Turabian StyleGuo, Yawen, Hui Yuan, and Kun Zhang. 2020. "Associating Images with Sentences Using Recurrent Canonical Correlation Analysis" Applied Sciences 10, no. 16: 5516. https://doi.org/10.3390/app10165516
APA StyleGuo, Y., Yuan, H., & Zhang, K. (2020). Associating Images with Sentences Using Recurrent Canonical Correlation Analysis. Applied Sciences, 10(16), 5516. https://doi.org/10.3390/app10165516