A Comparison-Based Framework for Argument Quality Assessment
Abstract
:1. Introduction
- We introduce CompAQA, a novel comparison-based framework for argument quality assessments, which is applicable to both the pairwise argument quality classification task and the argument quality ranking task.
- CompAQA enhances objectivity and accuracy in argument quality ranking through a systematic approach of leveraging multiple pairwise comparisons with carefully selected reference arguments.
- Extensive evaluations across multiple datasets and model architectures validate the superiority and versatility of CompAQA.
2. Related Work
3. Method
3.1. Problem Definition
3.2. Constructing Comparison Pairs
3.3. Pairwise Comparison Module
3.3.1. Text Encoding
3.3.2. Pairwise Classification
3.3.3. Quality Score Prediction
3.4. Order-Based Data Augmentation
3.5. Inference
4. Experiments
4.1. Datasets
4.2. Evaluation Metrics
4.3. Implementation Details
4.4. Compared Methods
- SVM BOW [13] is a support vector regression ranker with an RBF kernel and bag-of-words features.
- BERT, RoBERTa, and DeBERTa refer to pre-trained language models fine-tuned on each dataset. The specific fine-tuning process followed the work of Gretz et al. [13] and Toledo et al. [21]. After confirmation with the authors, we learned that Toledo et al. [21] did not use a validation set in their BERT-based experiments on IBM-ArgQ-9.1kPairs and IBM-ArgQ-5.3kArgs. Therefore, based on our own data split, we replicated the BERT baseline on IBM-ArgQ-9.1kPairs and IBM-ArgQ-5.3kArgs using the hyperparameters provided by Toledo et al. [21]. Note that the inputs of these models all include the topic of each argument.
- TFR-BERT [39] is an ensemble method. It ensembles multiple BERT models fine-tuned with various ranking losses.
- ChatGPT is evaluated using in-context learning in few-shot settings. Specifically, we conducted tests on 0-shot, 2-shot, and 4-shot settings.
- BERT/RoBERTa/DeBERTa-Pair-CLS directly applies pre-trained language models for sentence pair classification. Notably, CompAQA distinguishes itself from these baselines by incorporating an order-based data augmentation strategy.
5. Results and Discussions
5.1. Main Results
5.2. Ablation Study
5.3. Results of Fine-Tuning Decoder-Only Pre-Trained Models
5.4. Hyperparameter Analysis
5.5. Threats to Validity
- Generalizability: When applied to new datasets, our method may require parameter-tuning to achieve optimal performance. Specifically, the values of l and m might need adjusting to accommodate different data characteristics.
- Computational Complexity: Our method necessitates comparing a target argument with multiple reference arguments to predict its quality score. This multi-comparison approach, while effective, inherently demands a higher computational cost.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Stede, M.; Schneider, J.; Hirst, G. Argumentation Mining; Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Lawrence, J.; Reed, C. Argument Mining: A Survey. Comput. Linguist. 2019, 45, 765–818. [Google Scholar] [CrossRef]
- Vecchi, E.M.; Falk, N.; Jundi, I.; Lapesa, G. Towards Argument Mining for Social Good: A Survey. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021, (Volume 1: Long Papers), Virtual Event, 1–6 August 2021; pp. 1338–1352. [Google Scholar]
- Ye, Y.; Teufel, S. End-to-End Argument Mining as Biaffine Dependency Parsing. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, 19–23 April 2021; pp. 669–678. [Google Scholar]
- Morio, G.; Ozaki, H.; Morishita, T.; Yanai, K. End-to-end Argument Mining with Cross-corpora Multi-task Learning. Trans. Assoc. Comput. Linguist. 2022, 10, 639–658. [Google Scholar] [CrossRef]
- Bao, J.; He, Y.; Sun, Y.; Liang, B.; Du, J.; Qin, B.; Yang, M.; Xu, R. A Generative Model for End-to-End Argument Mining with Reconstructed Positional Encoding and Constrained Pointer Mechanism. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 10437–10449. [Google Scholar]
- Jo, Y.; Bang, S.; Reed, C.; Hovy, E.H. Classifying Argumentative Relations Using Logical Mechanisms and Argumentation Schemes. Trans. Assoc. Comput. Linguist. 2021, 9, 721–739. [Google Scholar] [CrossRef]
- Sun, Y.; Liang, B.; Bao, J.; Yang, M.; Xu, R. Probing Structural Knowledge from Pre-trained Language Model for Argumentation Relation Classification. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 3605–3615. [Google Scholar]
- Saadat-Yazdi, A.; Pan, J.Z.; Kökciyan, N. Uncovering Implicit Inferences for Improved Relational Argument Mining. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, 2–6 May 2023; pp. 2476–2487. [Google Scholar]
- Schiller, B.; Daxenberger, J.; Gurevych, I. Aspect-Controlled Neural Argument Generation. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021; pp. 380–396. [Google Scholar]
- Saha, S.; Srihari, R.K. ArgU: A Controllable Factual Argument Generator. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 8373–8388. [Google Scholar]
- Alshomary, M.; Wachsmuth, H. Conclusion-based Counter-Argument Generation. In Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2023, Dubrovnik, Croatia, 2–6 May 2023; pp. 957–967. [Google Scholar]
- Gretz, S.; Friedman, R.; Cohen-Karlik, E.; Toledo, A.; Lahav, D.; Aharonov, R.; Slonim, N. A Large-Scale Dataset for Argument Quality Ranking: Construction and Analysis. In Proceedings of the Thirty-Fourth AAAI Conference on Artificial Intelligence, AAAI 2020, The Thirty-Second Innovative Applications of Artificial Intelligence Conference, IAAI 2020, The Tenth AAAI Symposium on Educational Advances in Artificial Intelligence, EAAI 2020, New York, NY, USA, 7–12 February 2020; pp. 7805–7813. [Google Scholar]
- Marro, S.; Cabrio, E.; Villata, S. Graph Embeddings for Argumentation Quality Assessment. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 4154–4164. [Google Scholar]
- Joshi, O.; Pitre, P.; Haribhakta, Y. ArgAnalysis35K: A large-scale dataset for Argument Quality Analysis. In Proceedings of the the 61st Annual Meeting of the Association for Computational Linguistics, ACL 2023, Toronto, ON, Canada, 9–14 July 2023; Volume 1: Long Papers, pp. 13916–13931. [Google Scholar]
- Stab, C.; Gurevych, I. Annotating Argument Components and Relations in Persuasive Essays. In Proceedings of the COLING 2014, 25th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Dublin, Ireland, 23–29 August 2014; pp. 1501–1510.
- Stab, C.; Gurevych, I. Recognizing Insufficiently Supported Arguments in Argumentative Essays. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, 3–7 April 2017; Volume 1: Long Papers, pp. 980–990. [Google Scholar]
- Wachsmuth, H.; Potthast, M.; Khatib, K.A.; Ajjour, Y.; Puschmann, J.; Qu, J.; Dorsch, J.; Morari, V.; Bevendorff, J.; Stein, B. Building an Argument Search Engine for the Web. In Proceedings of the 4th Workshop on Argument Mining, ArgMining@EMNLP 2017, Copenhagen, Denmark, 8 September 2017; pp. 49–59. [Google Scholar]
- Persing, I.; Ng, V. Modeling Thesis Clarity in Student Essays. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, ACL 2013, Sofia, Bulgaria, 4–9 August 2013; Volume 1: Long Papers, pp. 260–269. [Google Scholar]
- Ding, Y.; Bexte, M.; Horbach, A. Score It All Together: A Multi-Task Learning Study on Automatic Scoring of Argumentative Essays. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 13052–13063. [Google Scholar]
- Toledo, A.; Gretz, S.; Cohen-Karlik, E.; Friedman, R.; Venezian, E.; Lahav, D.; Jacovi, M.; Aharonov, R.; Slonim, N. Automatic Argument Quality Assessment—New Datasets and Methods. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing, EMNLP-IJCNLP 2019, Hong Kong, China, 3–7 November 2019; pp. 5624–5634. [Google Scholar]
- Slonim, N.; Bilu, Y.; Alzate, C.; Bar-Haim, R.; Bogin, B.; Bonin, F.; Choshen, L.; Cohen-Karlik, E.; Dankin, L.; Edelstein, L.; et al. An autonomous debating system. Nature 2021, 591, 379–384. [Google Scholar] [CrossRef]
- Wachsmuth, H.; Naderi, N.; Hou, Y.; Bilu, Y.; Prabhakaran, V.; Thijm, T.A.; Hirst, G.; Stein, B. Computational Argumentation Quality Assessment in Natural Language. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, 3–7 April 2017; Volume 1: Long Papers, pp. 176–187. [Google Scholar]
- Wachsmuth, H.; Werner, T. Intrinsic Quality Assessment of Arguments. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 8–13 December 2020; pp. 6739–6745. [Google Scholar]
- Habernal, I.; Gurevych, I. Which argument is more convincing? Analyzing and predicting convincingness of Web arguments using bidirectional LSTM. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, ACL 2016, Berlin, Germany, 7–12 August 2016; Volume 1: Long Papers, p. 2016. [Google Scholar]
- Gienapp, L.; Stein, B.; Hagen, M.; Potthast, M. Efficient Pairwise Annotation of Argument Quality. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020, Online, 5–10 July 2020; pp. 5772–5781. [Google Scholar]
- Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2019, Minneapolis, MN, USA, 2–7 June 2019; Volume 1: (Long and Short Papers), pp. 4171–4186. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Wolf, T.; Debut, L.; Sanh, V.; Chaumond, J.; Delangue, C.; Moi, A.; Cistac, P.; Rault, T.; Louf, R.; Funtowicz, M.; et al. Transformers: State-of-the-Art Natural Language Processing. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, EMNLP 2020, Demos, Online, 16–20 November 2020; pp. 38–45. [Google Scholar]
- Clark, D.B.; Sampson, V.D. Analyzing the quality of argumentation supported by personally-seeded discussions. In Proceedings of the Next 10 Years! Proceedings of the 2005 Conference on Computer Support for Collaborative Learning, CSCL ’05, Taipei, Taiwan, 30 May–4 June 2005; pp. 76–85. [Google Scholar]
- Wachsmuth, H.; Stein, B.; Ajjour, Y. “PageRank” for Argument Relevance. In Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics, EACL 2017, Valencia, Spain, 3–7 April 2017; Volume 1: Long Papers, pp. 1117–1127. [Google Scholar]
- Swanson, R.; Ecker, B.; Walker, M.A. Argument Mining: Extracting Arguments from Online Dialogue. In Proceedings of the SIGDIAL 2015 Conference, The 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, Prague, Czech Republic, 2–4 September 2015; pp. 217–226. [Google Scholar]
- Wachsmuth, H.; Khatib, K.A.; Stein, B. Using Argument Mining to Assess the Argumentation Quality of Essays. In Proceedings of the COLING 2016, 26th International Conference on Computational Linguistics, Proceedings of the Conference: Technical Papers, Osaka, Japan, 11–16 December 2016; pp. 1680–1691. [Google Scholar]
- Johnson, R.H.; Blair, J.A. Logical Self-Defense; Mcgraw-Hill: Toronto, ON, Canada, 1977. [Google Scholar]
- Hamblin, C.L. Fallacies. Tijdschr. Voor Filos. 1970, 33, 183–188. [Google Scholar]
- Lauscher, A.; Ng, L.; Napoles, C.; Tetreault, J.R. Rhetoric, Logic, and Dialectic: Advancing Theory-based Argument Quality Assessment in Natural Language Processing. In Proceedings of the 28th International Conference on Computational Linguistics, COLING 2020, Barcelona, Spain (Online), 8–13 December 2020; pp. 4563–4574. [Google Scholar]
- Persing, I.; Ng, V. Modeling Argument Strength in Student Essays. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, ACL 2015, Beijing, China, 26–31 July 2015; Volume 1: Long Papers, pp. 543–552. [Google Scholar]
- Skitalinskaya, G.; Klaff, J.; Wachsmuth, H. Learning From Revisions: Quality Assessment of Claims in Argumentation at Scale. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, 19–23 April 2021; pp. 1718–1729. [Google Scholar]
- Favreau, C.; Zouaq, A.; Bhatnagar, S. Learning to Rank with BERT for Argument Quality Evaluation. In Proceedings of the Thirty-Fifth International Florida Artificial Intelligence Research Society Conference, FLAIRS 2022, Hutchinson Island, Jensen Beach, FL, USA, 15–18 May 2022. [Google Scholar]
- Wang, Y.; Chen, X.; He, B.; Sun, L. Contextual Interaction for Argument Post Quality Assessment. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, EMNLP 2023, Singapore, 6–10 December 2023; pp. 10420–10432. [Google Scholar]
- Falk, N.; Lapesa, G. Bridging Argument Quality and Deliberative Quality Annotations with Adapters. In Proceedings of the Findings of the Association for Computational Linguistics: EACL 2023, Dubrovnik, Croatia, 2–6 May 2023; pp. 2424–2443. [Google Scholar]
- Fromm, M.; Berrendorf, M.; Faerman, E.; Seidl, T. Cross-Domain Argument Quality Estimation. In Proceedings of the Findings of the Association for Computational Linguistics: ACL 2023, Toronto, ON, Canada, 9–14 July 2023; pp. 13435–13448. [Google Scholar]
- Zhang, S.; Shen, Y.; Tan, Z.; Wu, Y.; Lu, W. De-Bias for Generative Extraction in Unified NER Task. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics, ACL 2022, Dublin, Ireland, 22–27 May 2022; Volume 1: Long Papers, pp. 808–818. [Google Scholar]
- Hu, M.; Wu, Y.; Gao, H.; Bai, Y.; Zhao, S. Improving Aspect Sentiment Quad Prediction via Template-Order Data Augmentation. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 7889–7900. [Google Scholar]
- Gou, Z.; Guo, Q.; Yang, Y. MvP: Multi-view Prompting Improves Aspect Sentiment Tuple Prediction. In Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), ACL 2023, Toronto, ON, Canada, 9-14 July 2023; pp. 4380–4397. [Google Scholar]
- Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Pennington, J.; Socher, R.; Manning, C.D. Glove: Global Vectors for Word Representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, EMNLP 2014, Doha, Qatar, 25–29 October 2014; A meeting of SIGDAT, a Special Interest Group of the ACL. pp. 1532–1543. [Google Scholar]
- Huang, Y.; Fang, M.; Cao, Y.; Wang, L.; Liang, X. DAGN: Discourse-Aware Graph Network for Logical Reasoning. In Proceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, NAACL-HLT 2021, Online, 6–11 June 2021; pp. 5848–5855. [Google Scholar]
- Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. LoRA: Low-Rank Adaptation of Large Language Models. In Proceedings of the Tenth International Conference on Learning Representations, ICLR 2022, Virtual Event, 25–29 April 2022. [Google Scholar]
Task | Input | Output | |
---|---|---|---|
Pairwise Argument Quality Classification | Argument 1 : | We should ban gambling because there is no benefit to allowing it. | Binary Label: 0 |
Argument 2: | We should ban gambling because it preys on people with addictions to make a few wealthy casino owners richer. | ||
Argument Quality Ranking | Gambling doesn’t benefit society in that it doesn’t produce anything in the way farms provide food or engineers create new technologies or artists create beauty. | Quality Score: 0.80 |
Model | IBM-ArgQ-5.3kArgs | IBM-Rank-30k | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | |
SVM BOW [13] | - | - | - | - | - | 0.3200 | 0.3100 | - | - | - |
Bi-LSTM GloVe [13] | - | - | - | - | - | 0.4400 | 0.4100 | - | - | - |
TFR-BERT [39] | 0.3500 | 0.3400 | 0.2300 | - | 0.6600 | 0.5200 | 0.4700 | 0.3200 | - | 0.8800 |
BERT [13] | - | - | - | - | - | 0.5200 | 0.4800 | - | - | - |
BERT † | 0.3902 | 0.3755 | 0.2597 | 0.1560 | 0.7565 | 0.5201 | 0.4794 | 0.3301 | 0.1328 | 0.9300 |
CI-BERT † | 0.4101 | 0.3959 | 0.2703 | 0.1544 | 0.7388 | 0.5230 | 0.4845 | 0.3380 | 0.1330 | 0.9487 |
CompAQA-BERT (Ours) | 0.4563 | 0.4417 | 0.3064 | 0.1580 | 0.8097 | 0.5282 | 0.4830 | 0.3390 | 0.1311 | 0.9635 |
RoBERTa [40] | - | - | - | - | - | 0.5283 | 0.4858 | - | - | 0.9427 |
RoBERTa † | 0.4132 | 0.3908 | 0.2716 | 0.1533 | 0.7729 | 0.5311 | 0.4872 | 0.3545 | 0.1348 | 0.9507 |
CI-RoBERTa † | 0.4612 | 0.4377 | 0.3013 | 0.1633 | 0.7385 | 0.5439 | 0.5056 | 0.3554 | 0.1339 | 0.9668 |
CompAQA-RoBERTa (Ours) | 0.4681 | 0.4585 | 0.3165 | 0.1517 | 0.7630 | 0.5642 | 0.5204 | 0.3670 | 0.1299 | 0.9543 |
DeBERTa | 0.4181 | 0.4030 | 0.2777 | 0.1562 | 0.7497 | 0.5604 | 0.5154 | 0.3643 | 0.1667 | 0.9481 |
CompAQA-DeBERTa (Ours) | 0.4657 | 0.4536 | 0.3127 | 0.1652 | 0.7352 | 0.5797 | 0.5373 | 0.3794 | 0.1371 | 0.9500 |
CI-BERT (Reported) [40] | - | - | - | - | - | 0.5375 | 0.4949 | - | - | 0.9388 |
CI-RoBERTa (Reported) [40] | - | - | - | - | - | 0.5604 | 0.5174 | - | - | 0.9648 |
ChatGPT-0-Shot | 0.3720 | 0.4043 | 0.2890 | 0.2353 | 0.7148 | 0.2496 | 0.2464 | 0.1749 | 0.2109 | 0.8217 |
ChatGPT-2-Shot | 0.3466 | 0.3178 | 0.2194 | 0.1996 | 0.7304 | 0.2421 | 0.2335 | 0.1591 | 0.2081 | 0.8524 |
ChatGPT-4-Shot | 0.3394 | 0.3126 | 0.2166 | 0.1997 | 0.6893 | 0.2315 | 0.2357 | 0.1608 | 0.2107 | 0.8522 |
Model | IBM-ArgQ-9.1kPairs | ||
---|---|---|---|
Acc. | F1. | AUC. | |
BERT-Pair-CLS | 74.26 | 73.99 | 82.88 |
CompAQA-BERT (Ours) | 77.49 | 77.41 | 85.69 |
RoBERTa-Pair-CLS | 75.29 | 75.11 | 84.21 |
CompAQA-RoBERTa (Ours) | 80.00 | 79.98 | 88.03 |
DeBERTa-Pair-CLS | 78.78 | 78.59 | 87.59 |
CompAQA-DeBERTa (Ours) | 81.17 | 81.16 | 88.83 |
Model | IBM-ArgQ-5.3kArgs | IBM-Rank-30k | ||||||||
---|---|---|---|---|---|---|---|---|---|---|
Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | |
CompAQA-BERT (Ours) | 0.4563 | 0.4417 | 0.3064 | 0.1580 | 0.8097 | 0.5282 | 0.4830 | 0.3390 | 0.1311 | 0.9635 |
w/o | 0.4472 | 0.4344 | 0.3015 | 0.1561 | 0.8035 | 0.5257 | 0.4795 | 0.3363 | 0.1321 | 0.9588 |
w/o | 0.4475 | 0.4337 | 0.3009 | 0.1823 | 0.7987 | 0.5230 | 0.4790 | 0.3358 | 0.1314 | 0.9681 |
w/o ODA | 0.4235 | 0.4096 | 0.2830 | 0.1558 | 0.7916 | 0.5169 | 0.4716 | 0.3305 | 0.1333 | 0.9477 |
w/o and | 0.4401 | 0.4240 | 0.2939 | 0.1543 | 0.8134 | 0.5227 | 0.4761 | 0.3339 | 0.1322 | 0.9768 |
w/o and ODA | 0.4189 | 0.4049 | 0.2802 | 0.1560 | 0.8236 | 0.5136 | 0.4705 | 0.3300 | 0.1331 | 0.9659 |
w/o and ODA | 0.4168 | 0.4027 | 0.2801 | 0.1566 | 0.8147 | 0.5127 | 0.4700 | 0.3283 | 0.1325 | 0.9601 |
w/o and and ODA | 0.4030 | 0.3971 | 0.2629 | 0.1590 | 0.7755 | 0.5095 | 0.4640 | 0.3227 | 0.1323 | 0.9577 |
CompAQA-RoBERTa (Ours) | 0.4681 | 0.4585 | 0.3165 | 0.1517 | 0.7630 | 0.5642 | 0.5204 | 0.3670 | 0.1299 | 0.9543 |
w/o | 0.4550 | 0.4480 | 0.3074 | 0.1553 | 0.7399 | 0.5647 | 0.5192 | 0.3661 | 0.1319 | 0.9589 |
w/o | 0.4626 | 0.4470 | 0.3084 | 0.1510 | 0.7385 | 0.5601 | 0.5125 | 0.3610 | 0.1301 | 0.9325 |
w/o ODA | 0.4646 | 0.4492 | 0.3108 | 0.1491 | 0.7558 | 0.5605 | 0.5167 | 0.3643 | 0.1310 | 0.9584 |
w/o and | 0.4541 | 0.4448 | 0.3059 | 0.1644 | 0.7417 | 0.5572 | 0.5092 | 0.3589 | 0.1286 | 0.9234 |
w/o and ODA | 0.4603 | 0.4453 | 0.3067 | 0.1498 | 0.7813 | 0.5589 | 0.5155 | 0.3633 | 0.1319 | 0.9697 |
w/o and ODA | 0.4580 | 0.4435 | 0.3063 | 0.1527 | 0.7743 | 0.5567 | 0.5052 | 0.3559 | 0.1287 | 0.9386 |
w/o and and ODA | 0.4548 | 0.4430 | 0.3014 | 0.1507 | 0.7223 | 0.5506 | 0.5047 | 0.3530 | 0.1312 | 0.9276 |
CompAQA-DeBERTa (Ours) | 0.4657 | 0.4536 | 0.3127 | 0.1652 | 0.7352 | 0.5797 | 0.5373 | 0.3794 | 0.1371 | 0.9500 |
w/o | 0.4625 | 0.4512 | 0.3109 | 0.1700 | 0.7376 | 0.5768 | 0.5328 | 0.3758 | 0.1355 | 0.9367 |
w/o | 0.4478 | 0.4370 | 0.3001 | 0.1663 | 0.7465 | 0.5737 | 0.5299 | 0.3736 | 0.1360 | 0.9367 |
w/o ODA | 0.4068 | 0.3919 | 0.2702 | 0.1557 | 0.7293 | 0.5632 | 0.5205 | 0.3665 | 0.1527 | 0.9517 |
w/o and | 0.4377 | 0.4278 | 0.2930 | 0.1683 | 0.7103 | 0.5728 | 0.5271 | 0.3714 | 0.1420 | 0.9557 |
w/o and ODA | 0.4013 | 0.3896 | 0.2673 | 0.1679 | 0.7334 | 0.5603 | 0.5117 | 0.3597 | 0.1494 | 0.9781 |
w/o and ODA | 0.4008 | 0.3888 | 0.2666 | 0.1574 | 0.7145 | 0.5631 | 0.5172 | 0.3646 | 0.1844 | 0.9508 |
w/o and and ODA | 0.3953 | 0.3806 | 0.2595 | 0.1616 | 0.7165 | 0.5562 | 0.5037 | 0.3591 | 0.1387 | 0.9404 |
Model | IBM-Rank-30k | ||||
---|---|---|---|---|---|
Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | |
Llama | 0.6103 | 0.5658 | 0.4035 | 0.1343 | 0.9252 |
CI-Llama | 0.6178 | 0.5738 | 0.4095 | 0.1322 | 0.9555 |
CompAQA-Llama (Ours) | 0.6270 | 05881 | 0.4190 | 0.1313 | 0.9521 |
CompAQA-DeBERTa | IBM-Rank-30k | ||||
---|---|---|---|---|---|
Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | |
0.5810 | 0.5356 | 0.3780 | 0.1292 | 0.9461 | |
0.5797 | 0.5373 | 0.3794 | 0.1371 | 0.9500 | |
0.5776 | 0.5334 | 0.3763 | 0.1323 | 0.9496 | |
0.5781 | 0.5342 | 0.3768 | 0.1309 | 0.9448 |
CompAQA-DeBERTa | IBM-Rank-30k | ||||
---|---|---|---|---|---|
Pear. | Spear. | TAU | MAE ↓ | NDCG@15 | |
0.5739 | 0.5313 | 0.3749 | 0.1320 | 0.9501 | |
0.5797 | 0.5373 | 0.3794 | 0.1371 | 0.9500 | |
0.5770 | 0.5323 | 0.3759 | 0.1281 | 0.9308 | |
0.5768 | 0.5335 | 0.3765 | 0.1267 | 0.9432 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Bao, J.; Jin, B.; Sun, Y.; Zhang, Y.; He, Y.; Xu, R. A Comparison-Based Framework for Argument Quality Assessment. Electronics 2024, 13, 4088. https://doi.org/10.3390/electronics13204088
Bao J, Jin B, Sun Y, Zhang Y, He Y, Xu R. A Comparison-Based Framework for Argument Quality Assessment. Electronics. 2024; 13(20):4088. https://doi.org/10.3390/electronics13204088
Chicago/Turabian StyleBao, Jianzhu, Bojun Jin, Yang Sun, Yice Zhang, Yuhang He, and Ruifeng Xu. 2024. "A Comparison-Based Framework for Argument Quality Assessment" Electronics 13, no. 20: 4088. https://doi.org/10.3390/electronics13204088
APA StyleBao, J., Jin, B., Sun, Y., Zhang, Y., He, Y., & Xu, R. (2024). A Comparison-Based Framework for Argument Quality Assessment. Electronics, 13(20), 4088. https://doi.org/10.3390/electronics13204088