Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning
Abstract
:1. Introduction
- A meta-learning-based corpus reweight module for distantly-supervised data-to-text generation is proposed to alleviate the negative impact on training a neural data-to-text model with a noisy training corpus.
- A corpus rewrite module, which reduces noise in low-quality training instances, is introduced to provide a corpus with better fidelity for training the neural data-to-text model.
- A new distantly-supervised data-to-text generation corpus, called DIST-ToTTo, is constructed. Evaluation results on both WITA and DIST-ToTTo demonstrate that our proposed corpus reweight and rewrite modules can boost both the base model and SOTA’s performance in generating faithful text.
2. Related Work
2.1. Data-To-Text Generation
2.2. Meta Learning
3. Background
3.1. Task Definition
3.2. Base Models
4. Approach
- In Section 4.1, we construct an oracle training subset, which can provide guidance on what kind of training instances are of high quality for the model.
- In Section 4.2, we propose a corpus reweight module, which utilizes meta learning to dynamically adjust the training instances’ weight during training, in order to mitigate the negative impact of those low-quality training instances.
- In Section 4.3, we propose a corpus rewrite module that transforms the noisiest data-text training pairs into better-aligned ones, guiding the model to generate text more faithfully.
4.1. Oracle Training Subset Construction
4.2. Corpus Reweight Module
4.3. Corpus Rewrite Module
4.4. Training
4.5. Algorithm
Algorithm 1 Our approach for data-to-text generation model from noisy data. |
|
5. Experiments
5.1. Setup
5.1.1. Dataset
5.1.2. Evaluation Metrics
- BLEU [42]: It evaluates the model’s generated text’s quality based on the n-gram overlap between generated text and reference text. The output of BLEU ranges from 0% to 100%. If the generated text is identical to the reference text, the score will be 100%. If the generated text shares no n-gram overlap with the reference text, the score will be 0%. The calculation of BLEU is illustrated in Equation (11). It consists of the following three parts. is the n-gram precision of the generated text, compared to the reference. is a positive weight for each n-gram. is the abbreviation for brevity penalty, which will penalize those generated texts that are shorter than reference, since the shorter the text, the higher precision it can potentially obtain. In practice, we report the BLEU score with N = 4 and the uniform weights .
- METEOR [44]: In addition to exact string matching between words in generated text and reference, it proposed to use WordNet to match words that share the same stem or are synonyms of each other, since those words share the same meaning. Furthermore, it proposes to group words in the text into chunks and use this to measure how well-ordered the words in the generated text are with respect to the reference. Equation (12) shows how the METEOR score is calculated. The combines the precision and the recall of generated text. The is based on the number of chunks in the matched sequence in the generated text. The fewer the chunks are grouped, the better the words are ordered, compared to the reference.
- CIDEr [46]: It proposed to use Term Frequency Inverse Document Frequency (TF-IDF) as weights to characterize the similarity between the generated text and reference and use cosine similarity function to calculate the CIDEr score. The idea of using TF-IDF is that it will give higher weight to infrequently occurring words in the dataset and lower weight to those commonly occurring words, as the latter will be deemed less informative. The details can be found in Vedantam et al. [46].
5.1.3. Implementation Details
5.2. Results
5.2.1. Comparing Methods
- S2SG is a variant to S2SL, which uses GRU-based [51] encoder–decoder framework instand of the LSTM-based one in S2SL.
- DSG is the current state-of-the-art model [6] for this task. It trains an estimator to penalize unrelated words in the vocabulary based on the structured data input and uses it to rebalance the beam search.
- S2ST + Meta can be considered as an ablation study that only employs the corpus reweight module to the S2ST model.
- S2ST + Full fully employs our approach, consisting of both a corpus reweight and corpus rewrite module, to the S2ST model.
- D + Meta can be considered as an ablation study that only employs the corpus reweight module to the DSG model.
- D + Full can be considered as the full model, which employs both the corpus reweight and corpus rewrite module to the state-of-the-art model DSG.
5.2.2. Automatic Evaluation
5.2.3. Human Evaluation
5.3. Analysis
5.3.1. Over-Generation Error Analysis
5.3.2. Noise Effect Analysis
5.3.3. Case Study
5.3.4. Rewrite Analysis
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
E | A training instance consists of the model’s input and target. |
D | Structured data as model’s input. |
T | Text about the structured data, which is the target of the model |
A triple, representing one of the structured data. | |
Entity’s name. | |
The type of this information. | |
The value of the triple. | |
One word in the target text. | |
S | The supporting matrix in a DSG (Distant Supervision Generation) model. |
s | DSG’s aggregated supportiveness score vector. |
Noun phrases in the target text. | |
Data quality confidence score. | |
An oracle training subset with better-aligned training instances. | |
A batch of training data. | |
Parameters of the text generation model. | |
The weight of each training instance with meta learning. | |
Learning rate. |
References
- Lebret, R.; Grangier, D.; Auli, M. Neural Text Generation from Structured Data with Application to the Biography Domain. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; pp. 1203–1213. [Google Scholar]
- Wiseman, S.; Shieber, S.; Rush, A. Challenges in Data-to-Document Generation. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, Copenhagen, Denmark, 7–11 September 2017; pp. 2253–2263. [Google Scholar]
- Puduppully, R.; Dong, L.; Lapata, M. Data-to-text Generation with Entity Modeling. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Florence, Italy, 28 July 2019; pp. 2023–2035. [Google Scholar]
- Uehara, Y.; Ishigaki, T.; Aoki, K.; Noji, H.; Goshima, K.; Kobayashi, I.; Takamura, H.; Miyao, Y. Learning with Contrastive Examples for Data-to-Text Generation. In Proceedings of the 28th International Conference on Computational Linguistics, Online, 8–13 December 2020; pp. 2352–2362. [Google Scholar]
- Guo, B.; Wang, H.; Ding, Y.; Wu, W.; Hao, S.; Sun, Y.; Yu, Z. Conditional text generation for harmonious human–machine interaction. ACM Trans. Intell. Syst. Technol. (TIST) 2021, 12, 1–50. [Google Scholar] [CrossRef]
- Fu, Z.; Shi, B.; Lam, W.; Bing, L.; Liu, Z. Partially-Aligned Data-to-Text Generation with Distant Supervision. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 9183–9193. [Google Scholar]
- Chen, D.L.; Mooney, R.J. Learning to Sportscast: A Test of Grounded Language Acquisition. In Proceedings of the 25th International Conference on Machine Learning, Helsinki, Finland, 5–9 July 2008; pp. 128–135. [Google Scholar]
- Parikh, A.; Wang, X.; Gehrmann, S.; Faruqui, M.; Dhingra, B.; Yang, D.; Das, D. ToTTo: A Controlled Table-To-Text Generation Dataset. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 1173–1186. [Google Scholar]
- Kukich, K. Design of a Knowledge-Based Report Generator. In Proceedings of the 21st Annual Meeting of the Association for Computational Linguistics, Cambridge, MA, USA, 15–17 June 1983; pp. 145–150. [Google Scholar]
- McKeown, K.R. Text Generation: Using Discourse Strategies and Focus Constraints to Generate Natural Language Text; Cambridge University Press: Cambridge, MA, USA, 1985. [Google Scholar]
- Gong, H.; Feng, X.; Qin, B.; Liu, T. Table-to-Text Generation with Effective Hierarchical Encoder on Three Dimensions (Row, Column and Time). In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the ninth International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 3143–3152. [Google Scholar]
- Mei, H.; Bansal, M.; Walter, M.R. What to talk about and how? Selective Generation using LSTMs with Coarse-to-Fine Alignment. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, San Diego, CA, USA, 12–17 June 2016; pp. 720–730. [Google Scholar]
- Wang, Z.; Wang, X.; An, B.; Yu, D.; Chen, C. Towards Faithful Neural Table-to-Text Generation with Content-Matching Constraints. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 1072–1086. [Google Scholar]
- Chen, W.; Su, Y.; Yan, X.; Wang, W.Y. KGPT: Knowledge-Grounded Pre-Training for Data-to-Text Generation. In Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), Online, 16–20 November 2020; pp. 8635–8648. [Google Scholar]
- Chen, W.; Chen, J.; Su, Y.; Chen, Z.; Wang, W.Y. Logical Natural Language Generation from Open-Domain Tables. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 7929–7942. [Google Scholar]
- Nie, F.; Wang, J.; Yao, J.G.; Pan, R.; Lin, C.Y. Operation-guided Neural Networks for High Fidelity Data-To-Text Generation. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, 31 October 2018; pp. 3879–3889. [Google Scholar]
- Suadaa, L.H.; Kamigaito, H.; Funakoshi, K.; Okumura, M.; Takamura, H. Towards Table-to-Text Generation with Numerical Reasoning. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Online, 1–6 August 2021; pp. 1451–1465. [Google Scholar]
- Chen, Z.; Chen, W.; Zha, H.; Zhou, X.; Zhang, Y.; Sundaresan, S.; Wang, W.Y. Logic2Text: High-Fidelity Natural Language Generation from Logical Forms. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2020, Online, 16–20 November 2020; pp. 2096–2111. [Google Scholar]
- Zhang, N.; Ye, H.; Yang, J.; Deng, S.; Tan, C.; Chen, M.; Huang, S.; Huang, F.; Chen, H. LOGEN: Few-shot Logical Knowledge-Conditioned Text Generation with Self-training. arXiv 2021, arXiv:2112.01404. [Google Scholar]
- Chen, Z.; Eavani, H.; Chen, W.; Liu, Y.; Wang, W.Y. Few-Shot NLG with Pre-Trained Language Model. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 183–190. [Google Scholar]
- Li, J.; Tang, T.; Zhao, W.X.; Wei, Z.; Yuan, N.J.; Wen, J.R. Few-shot Knowledge Graph-to-Text Generation with Pretrained Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, Online, 1–6 August 2021; pp. 1558–1568. [Google Scholar]
- Jolly, S.; Zhang, Z.X.; Dengel, A.; Mou, L. Search and learn: Improving semantic coverage for data-to-text generation. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 22 February 2022; pp. 10858–10866. [Google Scholar]
- Chang, E.; Shen, X.; Yeh, H.S.; Demberg, V. On Training Instance Selection for Few-Shot Neural Text Generation. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), Online, 1–6 August 2021; pp. 8–13. [Google Scholar]
- Dušek, O.; Howcroft, D.M.; Rieser, V. Semantic Noise Matters for Neural Natural Language Generation. In Proceedings of the 12th International Conference on Natural Language Generation, Tokyo, Japan, 28 October 2019; pp. 421–426. [Google Scholar]
- Vinyals, O.; Fortunato, M.; Jaitly, N. Pointer Networks. In Proceedings of the Advances in Neural Information Processing Systems, Palais des Congrès de Montréal, QC, Canada, 7 December 2015; pp. 2692–2700. [Google Scholar]
- Su, Y.; Meng, Z.; Baker, S.; Collier, N. Few-Shot Table-to-Text Generation with Prototype Memory. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 910–917. [Google Scholar]
- Zhao, W.; Liu, Y.; Wan, Y.; Yu, P. Attend, Memorize and Generate: Towards Faithful Table-to-Text Generation in Few Shots. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2021, Punta Cana, Dominican Republic, 7–11 November 2021; pp. 4106–4117. [Google Scholar]
- Chang, E.; Shen, X.; Zhu, D.; Demberg, V.; Su, H. Neural Data-to-Text Generation with LM-based Text Augmentation. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics, Online, 19–23 April 2021; pp. 758–768. [Google Scholar]
- Chang, E.; Demberg, V.; Marin, A. Jointly Improving Language Understanding and Generation with Quality-Weighted Weak Supervision of Automatic Labeling. In Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, Online, 19–23 April 2021; pp. 818–829. [Google Scholar]
- Kasner, Z.; Dusek, O. Neural Pipeline for Zero-Shot Data-to-Text Generation. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Dublin, Ireland, 22–27 May 2022; pp. 3914–3932. [Google Scholar]
- Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 7–9 August 2017; pp. 1126–1135. [Google Scholar]
- Thrun, S.; Pratt, L. Learning to learn: Introduction and overview. In Learning to Learn; Springer: Boston, MA, USA, 1998; pp. 3–17. [Google Scholar]
- Volpi, R.; Larlus, D.; Rogez, G. Continual adaptation of visual representations via domain randomization and meta-learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, IEEE, Nashville, TN, USA, 20–25 June 2021; pp. 4443–4453. [Google Scholar]
- Wang, C.; Pan, H.; Qiu, M.; Huang, J.; Yang, F.; Zhang, Y. Meta Distant Transfer Learning for Pre-trained Language Models. In Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, Online and Punta Cana, Dominican Republic, 7–11 November 2021; pp. 9742–9752. [Google Scholar]
- Ren, M.; Zeng, W.; Yang, B.; Urtasun, R. Learning to reweight examples for robust deep learning. In Proceedings of the 35th International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 4334–4343. [Google Scholar]
- Shu, J.; Xie, Q.; Yi, L.; Zhao, Q.; Zhou, S.; Xu, Z.; Meng, D. Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 8 December 2019; pp. 1919–1930. [Google Scholar]
- Li, Z.; Nie, J.Y.; Wang, B.; Du, P.; Zhang, Y.; Zou, L.; Li, D. Meta-Learning for Neural Relation Classification with Distant Supervision. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, Ireland, 19 October 2020; pp. 815–824. [Google Scholar]
- Wu, L.; Xie, P.; Zhou, J.; Zhang, M.; Chunping, M.; Xu, G.; Zhang, M. Robust Self-Augmentation for Named Entity Recognition with Meta Reweighting. In Proceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Seattle, WA, USA, 10–15 July 2022; pp. 4049–4060. [Google Scholar]
- Eshratifar, A.E.; Eigen, D.; Pedram, M. Gradient agreement as an optimization objective for meta-learning. In Proceedings of the second Workshop on Meta-Learning at NeurIPS 2018, Montreal, QC, Canada, 8 December 2018. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.U.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4 December 2017; pp. 6000–6010. [Google Scholar]
- Novikova, J.; Dušek, O.; Rieser, V. The E2E Dataset: New Challenges For End-to-End Generation. In Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrücken, Germany, 15–17 August 2017; pp. 201–206. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the Annual meeting on Association for Computational Linguistics, Philadelphia, PA, USA, 6 July 2002; pp. 311–318. [Google Scholar]
- Doddington, G. Automatic evaluation of machine translation quality using n-gram co-occurrence statistics. In Proceedings of the Second International Conference on Human Language Technology Research, San Francisco, CA, USA, 24 March 2002; pp. 138–145. [Google Scholar]
- Banerjee, S.; Lavie, A. METEOR: An automatic metric for MT evaluation with improved correlation with human judgments. In Proceedings of the ACL Workshop on Intrinsic and Extrinsic Evaluation Measures for Machine Translation and/or Summarization, Ann Arbor, MI, USA, 23 June 2005; pp. 65–72. [Google Scholar]
- Lin, C.Y. Rouge: A package for automatic evaluation of summaries. In Proceedings of the Text Summarization Branches Out, Barcelona, Spain, 25–26 July 2004; pp. 74–81. [Google Scholar]
- Vedantam, R.; Lawrence Zitnick, C.; Parikh, D. Cider: Consensus-based image description evaluation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, IEEE, Boston, MA, USA, 7–12 June 2015; pp. 4566–4575. [Google Scholar]
- Ott, M.; Edunov, S.; Baevski, A.; Fan, A.; Gross, S.; Ng, N.; Grangier, D.; Auli, M. fairseq: A Fast, Extensible Toolkit for Sequence Modeling. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (Demonstrations), Minneapolis, Minnesota, 2–7 June 2019; pp. 48–53. [Google Scholar]
- See, A.; Liu, P.J.; Manning, C.D. Get To The Point: Summarization with Pointer-Generator Networks. In Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Vancouver, BC, Canada, 30 July 2017; pp. 1073–1083. [Google Scholar]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
- Gardent, C.; Shimorina, A.; Narayan, S.; Perez-Beltrachini, L. The WebNLG Challenge: Generating Text from RDF Data. In Proceedings of the tenth International Conference on Natural Language Generation, Santiago de Compostela, Spain, 4–7 September 2017; pp. 124–133. [Google Scholar]
- Cho, K.; van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the Properties of Neural Machine Translation: Encoder–Decoder Approaches. In Proceedings of the SSST-8, Eighth Workshop on Syntax, Semantics and Structure in Statistical Translation, Doha, Qatar, 25 October 2014; pp. 103–111. [Google Scholar]
- Luong, T.; Pham, H.; Manning, C.D. Effective Approaches to Attention-based Neural Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; pp. 1412–1421. [Google Scholar]
- Gu, J.; Lu, Z.; Li, H.; Li, V.O. Incorporating Copying Mechanism in Sequence-to-Sequence Learning. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Berlin, Germany, 7–12 August 2016; pp. 1631–1640. [Google Scholar]
- Fleiss, J.L. Measuring nominal scale agreement among many raters. Psychol. Bull. 1971, 76, 378. [Google Scholar] [CrossRef]
WITA | DIST-ToTTo | |
---|---|---|
Size | 55,400 | 128,533 |
Text Length | (18.8, 17, 5, 59) | (24.0, 22, 3, 81) |
KB Number | (3.0, 3, 1, 11) | (3.5, 3, 1, 130) |
Vocabulary | 102,404 | 153,963 |
Dataset | Model | BLEU | NIST | METEOR | ROUGE | CIDEr |
---|---|---|---|---|---|---|
WITA | S2SL | 48.08 | 8.459 | 38.75 | 70.92 | 4.415 |
S2SG | 47.78 | 8.267 | 38.31 | 72.21 | 4.543 | |
S2ST | 54.52 | 8.631 | 41.85 | 73.81 | 5.045 | |
DSG | 56.69 | 9.241 | 43.09 | 76.26 | 5.380 | |
S2ST + Meta | 55.33 | 8.925 | 42.25 | 74.86 | 5.150 | |
S2ST + Full | 56.34 | 9.097 | 42.75 | 75.71 | 5.255 | |
D + Meta | 58.00 | 9.304 | 43.85 | 76.72 | 5.486 | |
D + Full | 58.80 | 9.341 | 44.10 | 77.14 | 5.573 | |
DIST-ToTTo | S2SL | 25.25 | 6.555 | 26.23 | 48.32 | 2.140 |
S2SG | 24.41 | 6.727 | 25.73 | 47.72 | 2.050 | |
S2ST | 35.91 | 8.343 | 31.64 | 54.89 | 2.697 | |
DSG | 38.22 | 8.323 | 33.09 | 57.60 | 2.950 | |
S2ST + Meta | 38.52 | 8.869 | 33.11 | 57.14 | 2.898 | |
S2ST + Full | 39.10 | 8.931 | 33.82 | 57.61 | 2.944 | |
D + Meta | 39.03 | 8.690 | 33.70 | 58.16 | 3.019 | |
D + Full | 39.68 | 8.785 | 34.02 | 58.68 | 3.084 |
Dataset | Model | Grammaticality | Fidelity |
---|---|---|---|
WITA | S2ST | 11.61 | 5.72 |
S2ST + Full | 16.17 | 12.06 | |
DSG | 11.33 | 7.44 | |
D + Full | 12.67 | 11.33 | |
DIST-ToTTo | S2ST | 7.72 | −2.67 |
S2ST + Full | 11.67 | 7.5 | |
DSG | 15.06 | 11.28 | |
D + Full | 20.00 | 22.78 |
Dataset | Model | 1-Gram | 2-Gram | 3-Gram | 4-Gram | 5-Gram |
---|---|---|---|---|---|---|
WITA | S2ST | 624 | 2232 | 2804 | 2770 | 2492 |
S2ST + Meta | 588 | 2158 | 2733 | 2693 | 2406 | |
S2ST + Full | 569 | 2110 | 2671 | 2625 | 2343 | |
DSG | 463 | 2041 | 2635 | 2594 | 2309 | |
D + Meta | 431 | 1998 | 2606 | 2590 | 2317 | |
D + Full | 434 | 1970 | 2584 | 2572 | 2294 | |
DIST-ToTTo | S2ST | 14234 | 27539 | 28954 | 26320 | 22769 |
S2ST + Meta | 12096 | 25496 | 27249 | 24832 | 21343 | |
S2ST + Full | 11646 | 24927 | 26880 | 24538 | 21081 | |
DSG | 10183 | 24437 | 26659 | 24473 | 21098 | |
D + Meta | 9336 | 22620 | 24629 | 22362 | 18954 | |
D + Full | 9253 | 22426 | 24479 | 22259 | 18863 |
Drop Percentage | BLEU | NIST | METEOR | ROUGE | CIDEr |
---|---|---|---|---|---|
0% | 54.52 | 8.631 | 41.85 | 73.81 | 5.045 |
10% | 53.20 | 8.857 | 41.27 | 73.94 | 5.134 |
20% | 52.79 | 8.734 | 41.37 | 73.70 | 5.006 |
30% | 51.69 | 8.675 | 40.69 | 72.97 | 4.852 |
40% | 48.81 | 8.479 | 39.15 | 71.12 | 4.602 |
50% | 46.63 | 8.209 | 37.98 | 69.75 | 4.412 |
S2ST + Full | 56.34 | 9.097 | 42.75 | 75.71 | 5.255 |
KB | S2ST | S2ST + Full | DSG | D + Full | Gold |
---|---|---|---|---|---|
〈The Keys of the Kingdom, author, A. J. Cronin〉, 〈The Keys of the Kingdom, genre, novel〉 | The Keys of the Kingdom is a 2003 novel by American author A. J. Cronin. | The Keys of the Kingdom is a novel by A. J. Cronin. | The Keys of the Kingdom is a 2012 novel by A. J. Cronin. | The Keys of the Kingdom is a novel by A. J. Cronin. | The Keys of the Kingdom is a novel by A. J. Cronin. |
〈The Roaring Forties, creator, Frederick Judd Waugh〉, 〈The Roaring Forties, material_used, oil paint〉, 〈The Roaring Forties, inception, 1908〉 | The Roaring Forties is a [MI] painting by Frederick Judd Waugh in 1908. | The Roaring Forties is a 1908 oil painting by Frederick Judd Waugh. | The Roaring Forties is a 1908 [MI] painting by Frederick Judd Waugh. | The Roaring Forties is a 1908 oil painting by Frederick Judd Waugh. | The Roaring Forties is a 1908 oil painting by Frederick Judd Waugh. |
〈Murdoch Cameron, date_of_birth, 31 March 1847〉, 〈Murdoch Cameron, date_of_death, 28 April 1930〉, 〈Regius Professor of Obstetrics and Gynaecology, Glasgow, part_of, University of Glasgow〉, 〈Murdoch Cameron, employer, University of Glasgow〉 | Murdoch Cameron (31 March 1847–28 April 1930) was an English professor of Glasgow at the University of Glasgow. | Murdoch Cameron (31 March 1847–28 April 1930) was [MI] Professor of Obstetrics and Gynaecology at the University of Glasgow. | Murdoch Cameron (31 March 1847–28 April 1930) was a [MI] Professor of Glasgow at the University of Glasgow. | Murdoch Cameron (31 March 1847–28 April 1930) was Regius Professor of Obstetrics and Gynaecology at the University of Glasgow. | Murdoch Cameron (31 March 1847–28 April 1930) was Regius Professor of Obstetrics and Gynaecology at the University of Glasgow. |
KB | S2ST | S2ST + Full | DSG | D + Full | Gold |
---|---|---|---|---|---|
〈Title: Marc Abaya, Television〉, 〈Television: Ligaw na Bulaklak〉, 〈Television: Francis〉, 〈Television: ABS-CBN〉 | In 2011, Marc Abaya appeared [MI] in ABS–CBN’s Ligaw Bulaklak. | Marc Abaya played the role of Francis in ABS–CBN’s Ligaw na Bulaklak. | Abaya played Francis in [MI] Ligaw na Bulaklak. | Marc Abaya played as Francis in ABS–CBN ’s Ligaw na Bulaklak. | Abaya played the role of Francis in an ABS–CBN, Ligaw na Bulaklak. |
〈Title: Serbia in the Junior Eurovision Song Contest, Participation〉, 〈Year: 2007〉, 〈Artist: Nevena Božović〉, 〈Song: “Piši mi”〉 | In 2007 [MI], Serbia selected Nevena Božović with the song “Pišši mi mi”. | Nevena Božović represented Serbia at the 2007 [MI] contest with the song “Piši mi”. | Nevena Božović represented Serbia in the Junior Eurovision Song Contest 2007 [MI]. | Nevena Božović represented Serbia in the Junior Eurovision Song Contest 2007 with the song “Piši mi”. | At the 2007 Junior Eurovision Song Contest, Nevena Božović represented Serbia with the song “Piši mi”. |
〈Title: The Weight of These Wings, Awards〉, 〈Year: 2017〉, 〈Association: ACM Awards〉, 〈Category: Album of the Year〉, 〈Result: Won〉 | The Weight of These Wings was nominated for Album of the Year at the 2017 ACM M Awards. | The Weight of These Wings won Album of the Year at the 2017 ACM Awards. | At the 2017 ACM Awards, the album [MI] won Album of the Year. | At the ACM Awards of 2017, The Weight of These Wings won Album of the Year. | The Weight of These Wings won Album of the Year at the 2017 ACM Awards. |
KB | Original Text | Rewritten Text |
---|---|---|
〈James Boyd, occupation, American football player〉, 〈James P. Boyd, position_held, member of the Ontario Provincial Parliament〉 | James Boyd Greenspoon (7 February 1948–11 March 2015) was an American keyboard player and composer, best known as a member of the band Three Dog Night. | James Boyd is an American former football player and member of the Ontario Provincial Parliament. |
〈Amanikhabale, position_held, King of Kush〉, 〈King of Kush, organization_directed_from_the_office, Kush〉 | Amanikhabale (also transliterated Astabarqaman) was a King of Kush (circa 50 BCE- 40 BCE). | Amanikhabale was the King of Kush for Kush. |
〈MARCbot, manufacturer, Exponent〉, 〈MARCbot, subclass_of, military robot〉 | The Multi—function Agile Remote - Controlled Robot (MARCbot) is a military robot created by Exponent Inc. for the United States Army Rapid Equipping Force. | The MARCbot was a military robot of the Exponent. |
KB | Original Text | Rewritten Text |
---|---|---|
〈Title: List of heads of government of Sierra Leone, Prime Ministers of Sierra Leone〉, 〈Prime Minister, Name (Born–Died): Sorie Ibrahim Koroma (1930–1994)〉, 〈Term of Office, Took Office: 21 April 1971〉, 〈Term of Office, Left Office: 8 July 1975〉 | From 1968 until 1985, Koroma served various functions in the government, including Minister of Agriculture and Natural Resources (1969–1971), Vice–President (1971–1985) and Prime Minister (1971–1975), Minister of Finance (1975–1978). | Koroma was the Prime Minister of Sierra Leone from 21 April 1971 to 8 July 1975. |
〈Title: List of mayors of Manchester, 1838–1893〉, 〈Mayor: Sir James Watts〉, 〈Tenure began: 1855〉, 〈Tenure ended: 1857〉 | Sir James Watts JP (6 March 1804–7 April 1878) was Mayor of Manchester (1855–1857), High Sheriff of Lancashire and owner of Abney Hall. | Sir James Watts was Mayor of Manchester from 1855 to 1857. |
〈Title: List of the oldest mosques, Eurasia〉, 〈Building: Masjid Mazin〉, 〈Country: Oman〉 | Masjid Māzin is considered to be the oldest mosque in the country [MI]. | Masjid Mazin is the oldest mosques in Oman. |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, H.; Feng, X.; Qin, B. Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning. Appl. Sci. 2023, 13, 5573. https://doi.org/10.3390/app13095573
Gong H, Feng X, Qin B. Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning. Applied Sciences. 2023; 13(9):5573. https://doi.org/10.3390/app13095573
Chicago/Turabian StyleGong, Heng, Xiaocheng Feng, and Bing Qin. 2023. "Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning" Applied Sciences 13, no. 9: 5573. https://doi.org/10.3390/app13095573
APA StyleGong, H., Feng, X., & Qin, B. (2023). Quality Control for Distantly-Supervised Data-to-Text Generation via Meta Learning. Applied Sciences, 13(9), 5573. https://doi.org/10.3390/app13095573