SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity
Abstract
:1. Introduction
- We propose SupMPN model to fine-tune BERT so that it can generate sentence embeddings based on the semantic meaning instead of the frequency of the words.
- We provide a new contrastive objective function that involves the multiple hard positives and multiple hard negatives in contrasting learning simultaneously.
- Adding multiple hard positives and multiple hard negatives to contrastive learning boosts its performance by discrimination among multiple similar and dissimilar sentences.
- By contrasting among multiple similar and dissimilar sentences, our model can learn the semantic meaning of sentences and can generate better sentence representation space.
- Our model outperforms state-of-the-art SimCSE and all other previous supervised and unsupervised models.
2. Related Works
3. Background
3.1. Deep Metric Learning
3.2. Triplet Selection
- Hard positive:
- Hard negative:
3.3. Offline and Online Triplet Mining
4. SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model
4.1. Training Objective
4.2. Desirable Properties of SupMPN Model
5. Experiments
5.1. Training Data
5.2. Preparing Multiple Positives and Multiple Negatives
5.3. Training Setups
5.4. Baseline and Previous Supervised and Unsupervised Models for Comparison
5.5. First Experiment: Evaluation on STS Tasks
Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
---|---|---|---|---|---|---|---|---|
Unsupervised models | ||||||||
Glove embeddings (avg.) † | 55.14 | 70.66 | 59.73 | 68.25 | 63.66 | 58.02 | 53.76 | 61.32 |
fastText embeddings ‡ | 58.85 | 58.83 | 63.42 | 69.05 | 68.24 | 68.26 | 72.98 | 59.76 |
BERTbase (first-last avg.) | 39.70 | 59.38 | 49.67 | 66.03 | 66.19 | 53.87 | 62.06 | 56.70 |
BERTbase-flow-NLI | 58.40 | 67.10 | 60.85 | 75.16 | 71.22 | 68.66 | 64.47 | 66.55 |
BERTbase-whitening-NLI | 57.83 | 66.90 | 60.90 | 75.08 | 71.31 | 68.24 | 63.73 | 66.28 |
IS-BERTbase ♠ | 56.77 | 69.24 | 61.21 | 75.23 | 70.16 | 69.21 | 64.25 | 66.58 |
CT-BERTbase | 61.63 | 76.80 | 68.47 | 77.50 | 76.48 | 74.31 | 69.19 | 72.05 |
SG-BERTbase ♢ | 66.84 | 80.13 | 71.23 | 81.56 | 77.17 | 77.23 | 68.16 | 74.62 |
Mirror-BERTbase ♣ | 69.10 | 81.10 | 73.00 | 81.90 | 75.70 | 78.00 | 69.10 | 75.40 |
SimCSEunsup-BERTbase | 68.40 | 82.41 | 80.91 | 78.56 | 78.56 | 76.85 | 72.23 | 76.25 |
TSDAE-BERTbase ★ | 55.02 | 67.40 | 62.40 | 74.30 | 73.00 | 66.00 | 62.30 | 65.80 |
ConSERT-BERTbase ⧫ | 70.53 | 79.96 | 74.85 | 81.45 | 76.72 | 78.82 | 77.53 | 77.12 |
ConSERT-BERTlarge ⧫ | 73.26 | 82.37 | 77.73 | 83.84 | 78.75 | 81.54 | 78.64 | 79.44 |
RoBERTabase (first-last avg.) | 40.88 | 58.74 | 49.07 | 65.63 | 61.48 | 58.55 | 61.63 | 56.57 |
CLEAR-RoBERTabase ♡ | 49.00 | 48.90 | 57.40 | 63.60 | 65.60 | 72.50 | 75.60 | 61.08 |
DeCLUTR-RoBERTabase ‡ | 52.41 | 75.19 | 65.52 | 77.12 | 78.63 | 72.41 | 68.62 | 69.99 |
Supervised models | ||||||||
InferSent-GloVe † | 52.86 | 66.75 | 62.15 | 72.77 | 66.87 | 68.03 | 65.65 | 65.01 |
Universal Sentence Encoder † | 64.49 | 67.80 | 64.61 | 76.83 | 73.18 | 74.92 | 76.69 | 71.22 |
SBERTbase † | 70.97 | 76.53 | 73.19 | 79.09 | 74.30 | 77.03 | 72.91 | 74.89 |
SBERTbase-nli-v2 ★ | 72.50 | 84.80 | 80.20 | 84.80 | 80.00 | 83.90 | 78.00 | 80.60 |
SBERTbase-flow | 69.78 | 77.27 | 74.35 | 82.01 | 77.46 | 79.12 | 76.21 | 76.60 |
SBERTbase-whitening | 69.65 | 77.57 | 74.66 | 82.27 | 78.39 | 79.52 | 76.91 | 77.00 |
CT-SBERTbase | 74.84 | 83.20 | 78.07 | 83.84 | 77.93 | 81.46 | 76.42 | 79.39 |
SG-BERTbase ♢ | 75.16 | 81.27 | 76.31 | 84.71 | 80.33 | 81.46 | 76.64 | 79.41 |
SimCSEsup-BERTbase | 75.30 | 84.67 | 80.19 | 85.40 | 80.82 | 84.25 | 80.39 | 81.57 |
SupMPN-BERTbase | 75.96 | 84.96 | 80.61 | 85.63 | 81.69 | 84.90 | 80.72 | 82.07 |
SimCSEsup-BERTlarge | 75.78 | 86.33 | 80.44 | 86.06 | 80.86 | 84.87 | 81.14 | 82.21 |
SupMPN-BERTlarge | 77.53 | 86.50 | 81.68 | 85.99 | 82.87 | 86.09 | 81.38 | 83.15 |
5.6. Second Experiment: Evaluation on Transfer-Learning Tasks
5.7. Third Experiment: Textual Semantic Similarity in Representation Space
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
BERT | Bidirectional Encoder Representations from Transformers |
Bi-LSTM | Bidirectional Long-Short Term Memory |
ELMo | Embeddings from Language Model |
MLM | Mask Language Model |
MNRL | Multiple Negatives Ranking Loss |
NLI | Natural Language Inference |
NLP | Natural Language Processing |
NSP | Next Sentence Prediction |
NT-Xent | Normalized Temperature-scale Cross-Entropy |
RoBERTa | Robustly Optimized BERT Pretraining Approach |
SBERT | Sentence-BERT |
STS | Semantic Textual Similarity |
SupCon | Supervised Contrastive |
USE | Universal Sentence Encoder |
Appendix A
Appendix A.1. Result of Using Text Augmentation to Prepare Positive Samples
- None: No data augmentation is used. We simply copy each anchor sentence several times as its positive samples (our implementation).
- Random Word Deletion (RD): We randomly delete 10% of each entailment’s words.
- Synonym Replacement (SR): We randomly substitute 20% of each entailment’s words with their synonyms using WordNet [89].
- Paraphrasing (PP): We paraphrase almost 50% of entailments.
Model | STS12 | STS13 | STS14 | STS15 | STS16 | STS-B | SICK-R | Avg. |
---|---|---|---|---|---|---|---|---|
SupMPN-BERTbase-None | 75.96 | 84.96 | 80.61 | 85.63 | 81.69 | 84.90 | 80.72 | 82.07 |
SupMPN-BERTbase-RD | 75.82 | 84.80 | 80.37 | 85.76 | 81.98 | 84.36 | 80.08 | 81.88 |
SupMPN-BERTbase-SR | 76.75 | 83.82 | 80.34 | 86.04 | 80.67 | 83.88 | 80.09 | 81.66 |
SupMPN-BERTbase-PP | 76.10 | 84.66 | 79.81 | 84.53 | 81.65 | 84.02 | 81.18 | 81.71 |
Appendix A.2. Statistics
Model | Training Data | Size |
---|---|---|
BERT | Book Corpus + English Wikipedia | Not Specified |
BERT-flow | SNLI + MNLI | 570 K + 433 K |
BERT-mirror | Training set of the STS Benchmark (for STS tasks) | 10 K |
BERT-whitening | SNLI + MNLI | 570 K + 433 K |
CLEAR | Book Corpus + English Wikipedia | Not Specified |
CT-BERT | English Wikipedia | Not Specified |
ConSERT | SNLI + MNLI | 570 K + 433 K |
DeCLUTR | Open Web Text corpus | 497 K |
InferSent | SNLI + MNLI | 570 K + 433 K |
IS-BERT | SNLI + MNLI | 570 K + 433 K |
SBERT | SNLI + MNLI | 570 K + 433 K |
SBERT-base-nli-v2 | Part of (SNLI + MNLI) | Not specified |
SG-BERT | Part of (SNLI + MNLI) | Not Specified |
SimCSEsup | Part of (SNLI + MNLI) | 628 K |
SupMPN | Part of (SNLI + MNLI) | 628 K |
TSDAE | English Wikipedia | Not Specified |
USE | Web sources + Question answering + SNLI | Not Specified |
References
- Hliaoutakis, A.; Varelas, G.; Voutsakis, E.; Petrakis, E.G.M.; Milios, E. Information Retrieval by Semantic Similarity. 2006. Available online: https://www.researchgate.net/publication/283921249_Information_retrieval_by_semantic_similarity (accessed on 1 August 2022).
- Kim, S.; Fiorini, N.; Wilbur, W.J.; Lu, Z. Bridging the gap: Incorporating a semantic similarity measure for effectively mapping PubMed queries to documents. J. Biomed. Inform. 2017, 75, 122–127. [Google Scholar] [CrossRef] [PubMed]
- Mohamed, M.; Oussalah, M. SRL-ESA-TextSum: A text summarization approach based on semantic role labeling and explicit semantic analysis. Inf. Process. Manag. 2019, 56, 1356–1372. [Google Scholar] [CrossRef]
- Hou, Y.-B. A Text Summarization Method Based on Semantic Similarity among Sentences. DEStech Trans. Social Sci. Educ. Human Sci. 2020. [Google Scholar] [CrossRef]
- Mukherjee, I.; Mahanti, P.K.; Bhattacharya, V.; Banerjee, S. Text classification using document-document semantic similarity. Int. J. Web Sci. 2013, 2, 1–26. [Google Scholar] [CrossRef]
- Kim, Y. Convolutional Neural Networks for Sentence Classification. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1746–1751. [Google Scholar] [CrossRef]
- Malandrakis, N.; Falcone, M.; Vaz, C.; Bisogni, J.; Potamianos, A.; Narayanan, S. SAIL: Sentiment Analysis using Semantic Similarity and Contrast Features. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 512–516. [Google Scholar] [CrossRef]
- Janda, H.K.; Pawar, A.; Du, S.; Mago, V. Syntactic, Semantic and Sentiment Analysis: The Joint Effect on Automated Essay Evaluation. IEEE Access 2019, 7, 108486–108503. [Google Scholar] [CrossRef]
- Bordes, A.; Chopra, S.; Weston, J. Question Answering with Subgraph Embeddings. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 615–620. [Google Scholar] [CrossRef]
- Lopez-Gazpio, I.; Maritxalar, M.; Gonzalez-Agirre, A.; Rigau, G.; Uria, L.; Agirre, E. Interpretable semantic textual similarity: Finding and explaining differences between sentences. Knowl. Based Syst. 2017, 119, 186–199. [Google Scholar] [CrossRef]
- Castillo, J.; Estrella, P. Semantic Textual Similarity for MT evaluation. In Proceedings of the Seventh Workshop on Statistical Machine Translation, Montreal, QC, Canada, 7–8 June 2012; Available online: https://aclanthology.org/W12-3103 (accessed on 1 August 2022).
- Zou, W.Y.; Socher, R.; Cer, D.; Manning, C.D. Bilingual word embeddings for phrasebased machine translation. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1393–1398. Available online: https://aclanthology.org/D13-1141 (accessed on 1 August 2022).
- Liu, S.; He, T.; Li, J.; Li, Y.; Kumar, A. An Effective Learning Evaluation Method Based on Text Data with Real-time Attribution—A Case Study for Mathematical Class with Students of Junior Middle School in China. ACM Trans. Asian Low-Resour. Lang. Inf. Process. 2021. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 3–5 June 2019; Volume 1, pp. 4171–4186. [Google Scholar] [CrossRef]
- Li, B.; Zhou, H.; He, J.; Wang, M.; Yang, Y.; Li, L. On the Sentence Embeddings from Pre-trained Language Models. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Association for Computational Linguistics, Online, 16–20 November 2020; pp. 9119–9130. [Google Scholar] [CrossRef]
- Yan, Y.; Li, R.; Wang, S.; Zhang, F.; Wu, W.; Xu, W. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1), Virtual Event, 1–6 August 2021. [Google Scholar] [CrossRef]
- Chen, X.; He, K. Exploring Simple Siamese Representation Learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021. [Google Scholar] [CrossRef]
- Jalal, A.; Kim, Y.-H.; Kim, Y.-J.; Kamal, S.; Kim, D. Robust human activity recognition from depth video using spatiotemporal multi-fused features. Pattern Recognit. 2017, 61, 295–308. [Google Scholar] [CrossRef]
- Jalal, A.; Quaid, M.A.K.; Sidduqi, M.A. A Triaxial Acceleration-based Human Motion Detection for Ambient Smart Home System. In Proceedings of the 2019 IEEE 16th International Conference on Applied Sciences and Technology (IBCAST), Islamabad, Pakistan, 8–12 January 2019; pp. 353–358. [Google Scholar] [CrossRef]
- Wu, H.; Pan, W.; Xiong, X.; Xu, S. Human activity recognition based on the combined SVM&HMM. In Proceedings of the 2014 IEEE International Conference on Information and Automation (ICIA), Hailar, China, 28–30 July 2014; pp. 219–224. [Google Scholar] [CrossRef]
- Piyathilaka, L.; Kodagoda, S. Gaussian mixture based HMM for human daily activity recognition using 3D skeleton features. In Proceedings of the 2013 IEEE 8th Conference on Industrial Electronics and Applications (ICIEA), Melbourne, Australia, 19–21 June 2013. [Google Scholar] [CrossRef]
- Uddin, M.T.; Uddin, A. Human activity recognition from wearable sensors using extremely randomized trees. In Proceedings of the International Conference on Electrical Engineering and Information Communication Technology (ICEEICT), Savar, Bangladesh, 21–23 May 2015. [Google Scholar] [CrossRef]
- Tang, C.I.; Perez-Pozuelo, I.; Spathis, D.; Mascolo, C. Exploring Contrastive Learning in Human Activity Recognition for Healthcare. Presented at the Machine Learning for Mobile Health Workshop at NeurIPS 2020, Vancouver, BC, Canada, 2020. arXiv 2020, arXiv:2011.11542. [Google Scholar] [CrossRef]
- Huang, Q.; Yang, J.; Qiao, Y. Person re-identification across multi-camera system based on local descriptors. In Proceedings of the IEEE Conference on Distributed Smart Cameras, Hong Kong, China, 30 October–2 November 2012; pp. 1–6. [Google Scholar] [CrossRef]
- Khaldi, K.; Shah, S.K. CUPR: Contrastive Unsupervised Learning for Person Re-identification. In Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications (VISIGRAPP 2021)—Volume 5: VISAPP, Online, 8–10 February 2021; pp. 92–100. [Google Scholar] [CrossRef]
- Chen, I.-K.; Chi, C.-Y.; Hsu, S.-L.; Chen, L.-G. A real-time system for object detection and location reminding with RGB-D camera. In Proceedings of the 2014 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 10–13 January 2014. [Google Scholar] [CrossRef]
- Xie, E.; Ding, J.; Wang, W.; Zhan, X.; Xu, H.; Sun, P.; Li, Z.; Luo, P. DetCo: Unsupervised Contrastive Learning for Object Detection. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 8372–8381. [Google Scholar] [CrossRef]
- Ahad, M.A.R.; Kobashi, S.; Tavares, J.M.R.S. Advancements of image processing and vision in healthcare. J. Healthc. Eng. 2018, 2018, 8458024. [Google Scholar] [CrossRef] [PubMed]
- Zhang, Z.; Jang, J.; Trabelsi, C.; Li, R.; Sanner, S.; Jeong, Y.; Shim, D. ExCon: Explanation-driven Supervised Contrastive Learning for Image Classification. arXiv 2021, arXiv:2111.14271. [Google Scholar]
- Rathore, M.M.U.; Ahmad, A.; Paul, A.; Wu, J. Real-time continuous feature extraction in large size satellite images. J. Syst. Archit. EUROMICRO 2016, 64, 122–132. [Google Scholar] [CrossRef]
- Madhusudana, P.C.; Birkbeck, N.; Wang, Y.; Adsumilli, B.; Bovik, A.C. Image Quality Assessment using Contrastive Learning. IEEE Trans. Image Process. 2022, 31, 4149–4161. [Google Scholar] [CrossRef] [PubMed]
- Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised Contrastive Learning. arXiv 2021, arXiv:2004.11362. [Google Scholar]
- Wu, Z.; Sinong, S.; Gu, J.; Khabsa, M.; Sun, F.; Ma, H. CLEAR: Contrastive Learning for Sentence Representation. arXiv 2020, arXiv:2012.15466. [Google Scholar]
- Kim, T.; Yoo, K.M.; Lee, S. Self-Guided Contrastive Learning for BERT Sentence Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processin, Virtual Event, 1–6 August 2021; Volume 1. [Google Scholar] [CrossRef]
- Giorgi, J.; Nitski, O.; Wang, B.; Bader, G. DeCLUTR: Deep Contrastive Learning for Unsupervised Textual Representations. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, Virtual Event, 1–6 August 2021; Volume 1. [Google Scholar] [CrossRef]
- Liu, F.; Vulić, I.; Korhonen, A.; Collier, N. Fast, Effective, and Self-Supervised: Transforming Masked Language Models into Universal Lexical and Sentence Encoders. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar] [CrossRef]
- Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Punta Cana, Dominican Republic, 7–11 November 2021. [Google Scholar] [CrossRef]
- Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. arXiv 2020, arXiv:2002.05709. [Google Scholar]
- Hermans, A.; Beyer, L.; Leibe, B. In defence of triplet loss for person re-identification. arXiv 2017, arXiv:1703.07737. [Google Scholar]
- Sohn, K. Improved Deep Metric Learning with Multi-class N-pair Loss Objective. In Proceedings of the 30th Conference on Neural Information Processing Systems (NIPS 2016), Barcelona, Spain, 5–10 December 2016; Available online: https://proceedings.neurips.cc/paper/2016/file/6b180037abbebea991d8b1232f8a8ca9-Paper.pdf (accessed on 1 August 2022).
- Henderson, M.; Al-Rfou, R.; Strope, B.; Sung, Y.; Lukacs, L.; Guo, R.; Kumar, S.; Miklos, B.; Kurzweil, R. Efficient Natural Language Response Suggestion for Smart Reply. arXiv 2017, arXiv:1705.00652. [Google Scholar]
- Liu, S.; Xu, X.; Zhang, Y.; Muhammad, K.; Fu, W. A Reliable Sample Selection Strategy for Weakly-supervised Visual Tracking. IEEE Trans. Reliab. 2022, 1–12. [Google Scholar] [CrossRef]
- Bowman, S.R.; Angeli, G.; Potts, C.; Manning, C.D. A large annotated corpus for learning natural language inference. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015. [Google Scholar] [CrossRef]
- Williams, A.; Nangia, N.; Bowman, S. A broad-coverage challenge corpus for sentence understanding through inference. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018; Volume 1. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014. [Google Scholar] [CrossRef]
- Kiros, R.; Zhu, Y.; Salakhutdinov, R.; Zemel, R.; Torralba, A.; Urtasun, R.; Fidler, S. Skip-thought vectors. In Proceedings of the 28th International Conference on Neural Information Processing Systems, Montreal, QC, USA, 7–12 December 2015; pp. 3294–3302. [Google Scholar] [CrossRef]
- Bojanowski, P.; Grave, E.; Joulin, A.; Mikolov, T. Enriching word vectors with sub word information. Trans. Assoc. Comput. Linguist. 2017, 5, 135–146. [Google Scholar] [CrossRef]
- Conneau, A.; Kiela, D.; Schwenk, H.; Barrault, L.; Bordes, A. Supervised learning of universal sentence representations from natural language inference data. arXiv 2017, arXiv:1705.02364. [Google Scholar]
- Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep contextualized word representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, New Orleans, LA, USA, 1–6 June 2018. [Google Scholar] [CrossRef]
- Cer, D.; Yang, Y.; Kong, S.; Hua, N.; Limtiaco, N.; John, R.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; et al. Universal Sentence Encoder for English. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Brussels, Belgium, 31 October–4 November 2018; pp. 169–174. [Google Scholar] [CrossRef]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using siamese bert networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019. [Google Scholar] [CrossRef] [Green Version]
- Thakur, N.; Reimers, N.; Daxenberger, J.; Gurevych, I. Augmented sbert: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Online, 6–11 June 2021. [Google Scholar] [CrossRef]
- Wang, B.; Kuo, C.-C.J. SBERT-WK: A Sentence Embedding Method by Dissecting BERT-based Word Models. IEEE/ACM Trans. Audio Speech Lang. Process. 2020, 28, 2146–2157. [Google Scholar] [CrossRef]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Zhang, Y.; He, R.; Liu, Z.; Lim, K.H.; Bing, L. An unsupervised sentence embedding method by mutual information maximization. arXiv 2020, arXiv:2009.12061. [Google Scholar]
- Carlsson, F.; Gyllensten, A.C.; Gogoulou, E.; Hellqvist, E.Y.; Sahlgren, M. Semantic Re-Tuning with Contrastive Tension. International Conference on Learning Representations (ICLR). 2021. Available online: https://openreview.net/pdf?id=Ov_sMNau-PF (accessed on 1 August 2022).
- Hadsell, R.; Chopra, S.; LeCun, Y. Dimensionality reduction by learning an invariant mapping. In Proceedings of the 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), New York, NY, USA, 17–22 June 2006. [Google Scholar] [CrossRef]
- Chopra, S.; Hadsell, R.; LeCun, Y. Learning a similarity metric discriminatively with application to face verification. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, San Diego, CA, USA, 20–25 June 2005. [Google Scholar] [CrossRef]
- Schroff, F.; Kalenichenko, D.; Philbin, J. FaceNet: A unified embedding for face recognition and clustering. arXiv 2015, arXiv:1503.03832. [Google Scholar]
- Xuan, H.; Stylianou, A.; Liu, X.; Pless, R. Hard negative examples are hard, but useful. In ECCV 2020: Computer Vision—ECCV 2020; Springer: Cham, Switzerland, 2020. [Google Scholar] [CrossRef]
- Gao, L.; Zhang, Y.; Han, J.; Callan, J. Scaling Deep Contrastive Learning Batch Size under Memory Limited Setup. arXiv 2021, arXiv:2101.06983. [Google Scholar]
- Sikaroudi, M.; Ghojogh, B.; Safarpoor, A.; Karray, F.; Crowley, M.; Tizhoosh, H.R. Offline versus Online Triplet Mining based on Extreme Distances of Histopathology Patches. arXiv 2020, arXiv:2007.02200. [Google Scholar]
- Rosasco, L.; Vito, E.D.; Caponnetto, A.; Piana, M.; Verri, A. Are loss functions all the same? Neural Comput. 2004, 16, 1063–1076. [Google Scholar] [CrossRef] [PubMed]
- Shorten, C.; Khoshgoftaar, M.T.; Furht, B. Text Data Augmentation for Deep Learning. J. Big Data 2021, 8, 101. [Google Scholar] [CrossRef] [PubMed]
- Shorten, C.; Khoshgoftaar, M.T. A survey on Image Data Augmentation for Deep Learning. J. Big Data 2019, 6, 60. [Google Scholar] [CrossRef]
- Wu, X.; Gao, C.; Zang, L.; Han, J.; Wang, Z.; Hu, S. ESimCSE: Enhanced Sample Building Method for Contrastive Learning of Unsupervised Sentence Embedding. arXiv 2021, arXiv:2109.04380. [Google Scholar]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, A.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Kalantidis, Y.; Sariyildiz, M.B.; Pion, N.; Weinzaepfel, P.; Larlus, D. Hard Negative Mixing for Contrastive Learning. Adv. Neural Inf. Process. Syst. 2020, 33, 21798–21809. [Google Scholar]
- Mitrovic, J.; McWilliams, B.; Rey, M. Less Can Be More in Contrastive Learning. In ICBINB@NeurIPS; 2020; pp. 70–75. Available online: https://openreview.net/pdf?id=U2exBrf_SJh (accessed on 1 August 2022).
- Kingma, D.P.; Dhariwal, P. Glow: Generative flow with invertible 1x1 convolutions. In Proceedings of the Advances in Neural Information Processing Systems 31: Annual Conference on Neural Information Processing Systems, NeurIPS, Montréal, QC, Canada, 3–8 December 2018; pp. 10236–10245. [Google Scholar] [CrossRef]
- Su, J.; Cao, J.; Liu, W.; Ou, Y. Whitening sentence representations for better semantics and faster retrieval. arXiv 2021, arXiv:2103.15316. [Google Scholar]
- Wang, K.; Reimers, N.; Gurevych, I. TSDAE: Using Transformer-based Sequential Denoising Auto-Encoder for Unsupervised Sentence Embedding Learning. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP, Punta Cana, Dominican Republic, 16–20 November 2021. [Google Scholar] [CrossRef]
- Agirre, E.; Cer, D.; Diab, M.; Gonzalez-Agirre, A. SemEval-2012 task 6: A pilot on semantic textual similarity. In *SEM 2012: The First Joint Conference on Lexical and Computational Semantics—Volume 1: Proceedings of the Main Conference and the Shared Task, and Volume 2: Proceedings of the Sixth International Workshop on Semantic Evaluation (SemEval 2012); Association for Computational Linguistics: Atlanta, GA, USA, 2012; pp. 385–393. Available online: https://aclanthology.org/S12-1051 (accessed on 1 August 2022).
- Agirre, E.; Cer, D.; Diab, M.; Gonzalez-Agirre, A.; Guo, W. *SEM 2013 shared task: Semantic textual similarity. In Second Joint Conference on Lexical and Computational Semantics (*SEM), Volume 1: Proceedings of the Main Conference and the Shared Task: Semantic Textual Similarity; Association for Computational Linguistics: Atlanta, GA, USA, 2013; pp. 32–43. Available online: https://aclanthology.org/S13-1004 (accessed on 1 August 2022).
- Agirre, E.; Banea, C.; Cardie, C.; Cer, D.; Diab, M.; Gonzalez-Agirre, A.; Guo, W.; Mihalcea, R.; Rigau, G.; Wiebe, J. SemEval-2014 task 10: Multilingual semantic textual similarity. In Proceedings of the 8th International Workshop on Semantic Evaluation (SemEval 2014), Dublin, Ireland, 23–24 August 2014; pp. 81–91. Available online: https://aclanthology.org/S14-2010 (accessed on 1 August 2022).
- Agirre, E.; Banea, C.; Cardie, C.; Cer, D.; Diab, M.; Gonzalez-Agirre, A.; Guo, W.; Lopez-Gazpio, I.; Maritxalar, M.; Mihalcea, R.; et al. SemEval-2015 task 2: Semantic textual similarity, English, Spanish and pilot on interpretability. In Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval 2015), Denver, CO, USA, 4–5 June 2015; pp. 252–263. [Google Scholar] [CrossRef]
- Agirre, E.; Banea, C.; Cer, D.; Diab, M.; Gonzalez Agirre, A.; Mihalcea, R.; Rigau Claramunt, G.; Wiebe, J. SemEval-2016 task 1: Semantic textual similarity, monolingual and cross-lingual evaluation. In Proceedings of the 10th International Workshop on Semantic Evaluation (SemEval-2016) Association for Computational Linguistics, San Diego, CA, USA, 16–17 June 2016; pp. 497–511. [Google Scholar] [CrossRef]
- Cer, D.; Diab, M.; Agirre, E.; LopezGazpio, I.; Specia, L. SemEval-2017 task 1: Semantic textual similarity multilingual and crosslingual focused evaluation. In Proceedings of the 11th International Workshop on Semantic Evaluation (SemEval-2017), Vancouver, BC, Canada, 3–4 August 2017; pp. 1–14. [Google Scholar] [CrossRef]
- Marelli, M.; Menini, S.; Baroni, M.; Entivogli, L.; Bernardi, R.; Zamparelli, R. A SICK cure for the evaluation of compositional distributional semantic models. In Proceedings of the International Conference on Language Resources and Evaluation (LREC), Reykjavik, Iceland, 26–31 May 2014; pp. 216–223. Available online: https://aclanthology.org/L14-1314/ (accessed on 1 August 2022).
- Conneau, A.; Kiela, D. SentEval: An evaluation toolkit for universal sentence representations. In Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC), Miyazaki, Japan, 7–12 May 2018; Available online: https://aclanthology.org/L18-1269 (accessed on 1 August 2022).
- Pang, B.; Lee, L. Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), Ann Arbor, MI, USA, 25–30 June 2005; pp. 115–124. [Google Scholar] [CrossRef]
- Hu, M.; Liu, B. Mining and Summarizing Customer Reviews. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Seattle, WA, USA, 22–25 August 2004; ACM: New York, NY, USA, 2004; pp. 168–177. Available online: https://www.cs.uic.edu/~liub/publications/kdd04-revSummary.pdf (accessed on 1 August 2022).
- Pang, B.; Lee, L. A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts. In Proceedings of the 42nd Meeting of the Association for Computational Linguistics (ACL’04), Main Volume, Barcelona, Spain, 21–26 July 2004; pp. 271–278. Available online: https://aclanthology.org/P04-1035 (accessed on 1 August 2022).
- Wiebe, J.; Wilson, T.; Cardie, C. Annotating Expressions of Opinions and Emotions in Language. Lang. Resour. Eval. 2005, 39, 165–210. [Google Scholar] [CrossRef]
- Socher, R.; Perelygin, A.; Wu, J.; Chuang, J.; Manning, C.D.; Ng, A.; Potts, C. Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, WA, USA, 18–21 October 2013; pp. 1631–1642. Available online: https://aclanthology.org/D13-1170/ (accessed on 1 August 2022).
- Li, X.; Roth, D. Learning Question Classifiers. In Proceedings of the 19th International Conference on Computational Linguistics—Volume 1, COLING, Taipei, Taiwan, 26–30 August 2002; Association for Computational Linguistics: Stroudsburg, PA, USA, 2002; pp. 1–7. Available online: https://aclanthology.org/C02-1150/ (accessed on 1 August 2022).
- Dolan, B.; Quirk, C.; Brockett, C. Unsupervised Construction of Large Paraphrase Corpora: Exploiting Massively Parallel News Sources. In Proceedings of the 20th International Conference on Computational Linguistics, COLING 2004, Geneva, Switzerland, 23–27 August 2004; Association for Computational Linguistics: Stroudsburg, PA, USA, 2004. Available online: https://aclanthology.org/C04-1051 (accessed on 1 August 2022).
- Miller, G.A.; Beckwith, R.; Fellbaum, C.; Gross, D.; Miller, K.j. WordNet: An online lexical database. Int. J. Lexicogr. 1990, 3, 235–244. [Google Scholar] [CrossRef]
- Ma, E. NLP Augmentation. 2019. Available online: https://github.com/makcedward/nlpaug (accessed on 1 August 2022).
- Damodaran, P. Parrot: Paraphrase Generation for NLU. v1.0. 2021. Available online: https://github.com/PrithivirajDamodaran/Parrot_Paraphraser (accessed on 1 August 2022).
- Zhang, Y.; Baldridge, J.; He, L. PAWS: Paraphrase Adversaries from Word Scrambling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1. [Google Scholar] [CrossRef]
- Wieting, J.; Gimpel, K. ParaNMT-50M: Pushing the Limits of Paraphrastic Sentence Embeddings with Millions of Machine Translations. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, Melbourne, Australia, 15–20 July 2018; Volume 1. [Google Scholar] [CrossRef]
- Coucke, A.; Saade, A.; Ball, A.; Bluche, T.; Caulier, A.; Leroy, D.; Doumouro, C.; Gisselbrecht, T.; Caltagirone, F.; Lavril, T.; et al. Snips Voice Platform: An embedded Spoken Language Understanding system for private-by-design voice interfaces. arXiv 2018, arXiv:1805.10190. [Google Scholar]
Number of premises with exactly one entailment and one contradiction | Number of premises with two, three, or four entailments and contradictions | Number of premises with five or more entailments and contradictions | |
SNLI | 139,299 | 1890 | 7956 |
Number of premises with exactly one entailment and one contradiction | Number of premises with exactly two entailments and two contradictions | Number of premises with exactly three entailments and three contradictions | |
MNLI | 125,860 | 1783 | 489 |
Model | MR | CR | SUBJ | MPQA | SST-2 | TREC | MRPC | Avg. |
---|---|---|---|---|---|---|---|---|
Unsupervised models | ||||||||
Glove embeddings (avg.) † | 77.25 | 78.30 | 91.17 | 87.85 | 80.18 | 83.00 | 72.87 | 81.52 |
Skip-thought ♠ | 76.50 | 80.10 | 93.60 | 87.10 | 82.00 | 92.20 | 73.00 | 83.50 |
Avg. BERT embedding † | 78.66 | 86.25 | 94.37 | 88.66 | 84.40 | 92.80 | 69.54 | 84.94 |
BERT-[CLS] embedding † | 78.68 | 84.85 | 94.21 | 88.23 | 84.13 | 91.40 | 71.13 | 84.66 |
IS-BERTbase ♠ | 81.09 | 87.18 | 94.96 | 88.75 | 85.96 | 88.64 | 74.24 | 85.83 |
CT-BERTbase ∞ | 79.84 | 84.00 | 94.10 | 88.06 | 82.43 | 89.20 | 73.80 | 84.49 |
SimCSEunsup-BERTbase | 81.18 | 86.46 | 94.45 | 88.88 | 85.50 | 89.80 | 74.43 | 85.51 |
SimCSEunsup-BERTbase-MLM | 82.92 | 87.23 | 95.71 | 88.73 | 86.81 | 87.01 | 78.07 | 86.64 |
Supervised models | ||||||||
InferSent-GloVe † | 81.57 | 86.54 | 92.50 | 90.38 | 84.18 | 88.20 | 75.77 | 85.59 |
Universal Sentence Encoder † | 80.09 | 85.19 | 93.98 | 86.70 | 86.38 | 93.20 | 70.14 | 85.10 |
SBERTbase † | 83.64 | 89.43 | 94.39 | 89.86 | 88.96 | 89.60 | 76.00 | 87.41 |
SG-BERTbase ♢ | 82.47 | 87.42 | 95.40 | 88.92 | 86.20 | 91.60 | 74.21 | 86.60 |
SimCSEsup-BERTbase | 82.69 | 89.25 | 84.81 | 89.59 | 87.31 | 88.40 | 73.51 | 86.51 |
SimCSEsup-BERTbase-MLM | 82.68 | 88.88 | 94.52 | 89.82 | 88.41 | 87.60 | 76.12 | 86.86 |
SupMPN-BERTbase | 82.93 | 89.26 | 94.76 | 90.21 | 86.99 | 88.20 | 76.35 | 86.96 |
SupMPN-BERTlarge | 84.06 | 90.25 | 94.59 | 90.26 | 88.58 | 91.00 | 75.48 | 87.75 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Dehghan, S.; Amasyali, M.F. SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity. Appl. Sci. 2022, 12, 9659. https://doi.org/10.3390/app12199659
Dehghan S, Amasyali MF. SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity. Applied Sciences. 2022; 12(19):9659. https://doi.org/10.3390/app12199659
Chicago/Turabian StyleDehghan, Somaiyeh, and Mehmet Fatih Amasyali. 2022. "SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity" Applied Sciences 12, no. 19: 9659. https://doi.org/10.3390/app12199659
APA StyleDehghan, S., & Amasyali, M. F. (2022). SupMPN: Supervised Multiple Positives and Negatives Contrastive Learning Model for Semantic Textual Similarity. Applied Sciences, 12(19), 9659. https://doi.org/10.3390/app12199659