Toward Understanding Most of the Context in Document-Level Neural Machine Translation
Abstract
:1. Introduction
2. Related Work
2.1. Sentence Embedding
2.1.1. Universal Sentence Encoder
2.1.2. Sentence BERT
2.2. Similarity Measures
2.2.1. Cosine Similarity
2.2.2. Euclidean Distance
2.2.3. Manhattan Distance
2.2.4. Signal-to-Noise Ratio Distance
2.3. Attention Layer of the Transformer
2.4. Cardinality Residual Connection
2.5. Document-Level Neural Machine Translation
3. Context Sentence and Context-Aware NMT Model
3.1. Document-Level Sentences of Corpus by Similarity Measures
3.2. Incorporating Contexts into NMT Model
4. Experiments
5. Results and Discussion
5.1. The Reason for Using Similarity Measurment
5.2. Best Combination for Document-Level Data
5.3. Effect of the Proposed Methods on Document-Level Translation
5.4. Capturing the Contextual Information
5.5. What If the Data Created by Measuring the Similarity within Diverse Talk Ranges in Data Are Used?
Listing 1. Appearance of the original file. There are many talks separated by tags. |
<docdocid=“535”genre=“lectures”> <description>TED Talk Subtitles and Transcript: At TED2009, Al Gore presents updated slides from around the globe to make the case that worrying climate trends are even worse than scientists predicted, and to make clear his stance on “clean coal.”</description> <talkid>535</talkid> <title>Al Gore: What comes after An Inconvenient Truth?</title> <reviewer></reviewer> <translator></translator> <seg id=“1”> Last year I showed these two slides so that demonstrate that the arctic ice cap, which for most of the last three million years has been the size of the lower 48 states, has shrunk by 40 percent. </seg> <seg id=“2”> But this understates the seriousness of this particular problem because it doesn’t show the thickness of the ice. </seg> <seg id=“3”> The arctic ice cap is, in a sense, the beating heart of the global climate system. </seg> … <seg id=“90”> If you want to go far, go together.” </seg> <seg id=“91”> We need to go far, quickly. </seg> <seg id=“92”> Thank you very much. </seg> </doc> <doc docid= “531” genre=“lectures”> <description>TED Talk Subtitles and Transcript: In this short talk from TED U 2009, Brian Cox shares what’s new with the CERN supercollider. He covers the repairs now underway and what the future holds for the largest science experiment ever attempted.</description> <talkid>531</talkid> <title>Brian Cox: What went wrong at the LHC</title> <reviewer></reviewer> <translator></translator> <seg id=“1”> Last year at TED I gave an introduction to the LHC. </seg> <seg id=“2”> And I promised to come back and give you an update on how that machine worked. </seg> <seg id=“3”> So this is it. And for those of you that weren’t there, the LHC is the largest scientific experiment ever attempted -- 27 km in circumference. </seg> … |
5.6. View of the Similarity Measure
6. Conclusions, Limitations, and Future Research
Author Contributions
Funding
Conflicts of Interest
References
- Bahdanau, D.; Choi, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceeding of the 31st International Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
- Zhang, J.; Luan, H.; Sun, M.; Zhai, F.; Xu, J.; Zhang, M.; Liu, Y. Improving the transformer translation model with document-level context. arXiv 2018, arXiv:1810.03581. [Google Scholar]
- Maruf, S.; Martins, A.F.; Haffari, G. Selective attention for context-aware neural machine translation. arXiv 2019, arXiv:1903.08788. [Google Scholar]
- Miculicich, L.; Ram, D.; Pappas, N.; Henderson, J. Document-level neural machine translation with hierarchical attention networks. arXiv 2018, arXiv:1809.01576. [Google Scholar]
- Zhu, J.; Xia, Y.; Wu, L.; He, D.; Qin, T.; Zhou, W.; Li, H.; Liu, T.Y. Incorporating bert into neural machine translation. arXiv 2020, arXiv:2002.06823. [Google Scholar]
- Guo, Z.; Le Nguyen, M. Document-level neural machine translation using bert as context encoder. In Proceedings of the 1st Conference of the Asia-Pacific Chapter of the Association for Computational Linguistics and the 10th International Joint Conference on Natural Language Processing: Student Research Workshop, Suzhou, China, 4–7 December 2020; pp. 101–107. [Google Scholar]
- Voita, E.; Serdyukov, P.; Sennrich, R.; Titov, I. Context-aware neural machine translation learns anaphora resolution. arXiv 2018, arXiv:1805.10163. [Google Scholar]
- Wu, H.; Wang, Z.; Qing, F.; Li, S. Reinforced transformer with cross-lingual distillation for cross-lingual aspect sentiment classification. Electronics 2021, 10, 270. [Google Scholar] [CrossRef]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Xie, S.; Girshick, R.; Dollá, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Jean, S.; Lauly, S.; Firat, O.; Cho, K. Does neural machine translation benefit from larger context? arXiv 2017, arXiv:1704.05135. [Google Scholar]
- Hwang, Y.; Kim, Y.; Jung, K. Context-aware neural machine translation for Korean honorific expressions. Electronics 2021, 10, 1589. [Google Scholar] [CrossRef]
- Nayak, P.; Haque, R.; Kelleher, J.D.; Way, A. Investigating contextual influence in document-level translation. Information 2022, 13, 249. [Google Scholar] [CrossRef]
- Li, B.; Liu, H.; Wang, Z.; Jian, Y.; Xiao, T.; Zhu, J.; Liu, T.; Li, C. Does multi-encoder help? A case study on context-aware neural machine translation. arXiv 2020, arXiv:2005.03393. [Google Scholar]
- Matricciani, E. Linguistic mathematical relationships saved or lost in translating texts: Extension of the statistical theory of translation and its application to the new testament. Information 2022, 13, 20. [Google Scholar] [CrossRef]
- Matricciani, E. A statistical theory of language translation based on communication theory. Open J. Stat. 2020, 10, 936–997. [Google Scholar] [CrossRef]
- Cer, D.; Yang, Y.; Kong, S.Y.; Hua, N.; Limtiaco, N.; John, R.S.; Constant, N.; Guajardo-Cespedes, M.; Yuan, S.; Tar, C.; et al. Universal sentence encoder. arXiv 2018, arXiv:1803.11175. [Google Scholar]
- Reimers, N.; Gurevych, I. Sentence-bert: Sentence embeddings using Siamese BERT-networks. arXiv 2019, arXiv:1908.10084. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25, 1097–1105. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Tu, Z.; Liu, Y.; Shi, S.; Zhang, T. Learning to remember translation history with a continuous cache. Trans. Assoc. Comput. Linguist. 2018, 61, 407–420. [Google Scholar] [CrossRef] [Green Version]
- Kuang, S.; Xiong, D.; Luo, W.; Zhou, G. Modeling coherence for neural machine translation with dynamic and topic cache. arXiv 2017, arXiv:1711.11221. [Google Scholar]
- Maruf, S.; Haffari, G. Document context neural machine translation with memory networks. arXiv 2017, arXiv:1711.03688. [Google Scholar]
- Wang, L.; Tu, Z.; Way, A.; Liu, Q. Exploiting cross-sentence context for neural machine translation. arXiv 2017, arXiv:1704.04347. [Google Scholar]
- Ma, S.; Zhang, D.; Zhou, M. A simple and effective unified encoder for document-level machine translation. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–10 July 2020; pp. 3505–3511. [Google Scholar]
- Tiedemann, J.; Scherrer, Y. Neural machine translation with extended context. arXiv 2017, arXiv:1708.05943. [Google Scholar]
- Li, L.; Jiang, X.; Liu, Q. Pretrained language models for document-level neural machine translation. arXiv 2019, arXiv:1911.03110. [Google Scholar]
- Weng, R.; Yu, H.; Huang, S.; Cheng, S.; Luo, W. Acquiring knowledge from pre-trained model to neural machine translation. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 9266–9273. [Google Scholar]
- Yang, Z.; Zhang, J.; Meng, F.; Gu, S.; Feng, Y.; Zhou, J. Enhancing context modeling with a query-guided capsule network for document-level translation. arXiv 2019, arXiv:1909.00564. [Google Scholar]
- Yuan, T.; Deng, W.; Tang, J.; Tang, Y.; Chen, B. Signal-to-noise ratio: A robust distance metric for deep metric learning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–17 June 2019; pp. 4815–4824. [Google Scholar]
- Cettolo, M.; Cirardi, C.; Federico, M. WIT3: Web inventory of transcribed and translated talks. In Proceedings of the 16th Conference of the European Association for Machine Translation (EAMT), Trento, Italy, 28–30 May 2012; pp. 261–268. [Google Scholar]
- Sennrich, R.; Haddow, B.; Birch, A. Neural machine translation of rare words with subword units. arXiv 2015, arXiv:1508.07909. [Google Scholar]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.J. BLEU: A method for automatic evaluation of machine translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 7–12 July 2002; pp. 311–318. [Google Scholar]
- Chan, B.; Schweter, S.; Möller, T. German’s next language model. In Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain, 8–13 December 2020; pp. 6788–6796. [Google Scholar]
- Ott, M.; Edunov, S.; Baevski, A.; Fan, A.; Gross, S.; Ng, N.; Grangier, D.; Auli, M. FAIRSEQ: A Fast, Extensible Toolkit for Sequence Modeling. arXiv 2019, arXiv:1904.01038. [Google Scholar]
- Srivastava, N.; Hinton, G.E.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Type | Dataset | #Sent | Avg. #Sent |
---|---|---|---|
IWSLT’14 | En↔De | 0.16 M/7 K/6.7 K | - |
En→De | 0.1 M/7 K/6.7 K | - | |
En↔Es | 0.17 M/8 K/5.5 K | - | |
IWSLT’17 | En↔Fr | 0.22 M/9.9 K/9 K | - |
Maruf et al. (2019) | TED | 0.21 M/9 K/2.3 K | 121/96/99 |
News | 0.24 M/2 K/3 K | 39/27/19 |
Type | Dataset | Merge Operation |
---|---|---|
IWSLT’14 | En↔De | 10,000 |
En↔Es | 10,000 | |
IWSLT’17 | En↔Fr | 10,000 |
Maruf et al. (2019) | TED | 30,000 |
News |
Type of Context | Context | Similarity | |||
---|---|---|---|---|---|
Sentence Embedding | - | Universal Sentence Encoder | |||
Similarity Measurement Method | - | SNR | Cosine | Euclidean | Manhattan |
BLEU score | 27.95 | 28.19 | 28.64 | 28.58 | 28.4 |
Model | Transformer (Vaswani et al., 2017) | USE | BERT | ||||
---|---|---|---|---|---|---|---|
Cos | Euc | Manh | Cos | Euc | Manh | ||
IWSLT’14 En→De | 28.59 | 30.85 | 29.99 | 30.09 | 30.29 | 30.2 | 30.05 |
IWSLT’14 En→Es | 37.36 | 39.59 | 39.17 | 39.36 | 39.67 | 39.23 | 39.06 |
IWSLT’17 En→Fr | 40.54 | 43.68 | 43.51 | 43.49 | 43.91 | 43.4 | 43.26 |
IWSLT’14 De→En | 34.4 | 35.92 | 35.2 | 34.9 | 35.69 | 35.81 | 35.56 |
IWSLT’14 Es→En | 40.99 | 40.66 | 40.2 | 40.08 | 42.14 | 41.73 | 41.95 |
IWSLT’17 Fr→En | 41.00 | 42.09 | 41.55 | 41.46 | 42.03 | 42.36 | 42.09 |
Model | TED | News |
---|---|---|
RNN (Bahdanau et al., 2015) | 19.24 | 16.51 |
HAN (Werlen et al., 2018) | 24.58 | 25.03 |
SAN (Maruf et al., 2019) | 24.62 | 24.84 |
QCN (Yang et al., 2019) | 25.19 | 22.37 |
Transformer (Vaswani et al.,2017) | 23.28 | 22.78 |
Flat-transformer (Ma et al., 2020) | 24.87 | 23.55 |
+ BERT (Ma et al., 2020) | 26.61 | 24.52 |
BERT-fused model (Zhu et al., 2020) | 25.59 | 25.05 |
+ Context gate (Zhiyu et al., 2021) | 26.23 | 26.55 |
Our model | 27.23 | 27.98 |
Model (IWSLT) | Type of Context | ||
---|---|---|---|
Similarity | Context | Same | |
En→De | 30.85 | 30.12 | 29.45 |
En→Es | 39.67 | 37.09 | 37.21 |
En→Fr | 43.91 | 41.32 | 41.37 |
De→En | 35.92 | 35.13 | 35.13 |
Es→En | 42.14 | 41.83 | 41.46 |
Fr→En | 42.36 | 41.69 | 41.6 |
Sentence | Context Sentence | |
---|---|---|
Extracted in All Talks | Extracted in One Talk | |
it can be a very complicated thing, the ocean. | and it can be a very complicated thing, what human health is. | and it can be a very complicated thing, what human health is. |
and it can be a very complicated thing, what human health is. | health studies from the region are conflicting and fraught. | it can be a very complicated thing, the ocean. |
and bringing those two together might seem a very daunting task, but what i’m going to try to say is that even in that complexity, there’s some simple themes that i think, if we understand, we can really move forward. | well, right, it is a good thing to do, but you have to think what else you could do with the resources. | but in fact, if you look around the world, not only are there hope spots for where we may be able to fix problems, there have been places where problems have been fixed, where people have come to grips with these issues and begun to turn them around. |
and those simple themes aren’t really themes about the complex science of what’s going on, but things that we all pretty well know. | and the answer is not complicated but it’s one which i don’t want to go through here, other than to say that the communication systems for doing this are really pretty well understood. | that’s a good thing for this particular acute problem, but it does nothing to solve the pyramid problem. |
and i’m going to start with this one: if momma ain’t happy, ain’t nobody happy. | now, if your mother ever mentioned that life is not fair, this is the kind of thing she was talking about. | and if we just take that and we build from there, then we can go to the next step, which is that if the ocean ain’t happy, ain’t nobody happy. |
Sentence | Context Sentence | |
---|---|---|
Hybrid Method | Extracted in One Talk | |
it can be a very complicated thing, the ocean. | and it can be a very complicated thing, what human health is. | and it can be a very complicated thing, what human health is. |
and it can be a very complicated thing, what human health is. | health studies from the region are conflicting and fraught. | it can be a very complicated thing, the ocean. |
and bringing those two together might seem a very daunting task, but what i’m going to try to say is that even in that complexity, there’s some simple themes that i think, if we understand, we can really move forward. | well, right, it is a good thing to do, but you have to think what else you could do with the resources. | but in fact, if you look around the world, not only are there hope spots for where we may be able to fix problems, there have been places where problems have been fixed, where people have come to grips with these issues and begun to turn them around. |
and those simple themes aren’t really themes about the complex science of what’s going on, but things that we all pretty well know. | and the answer is not complicated but it’s one which i don’t want to go through here, other than to say that the communication systems for doing this are really pretty well understood. | that’s a good thing for this particular acute problem, but it does nothing to solve the pyramid problem. |
and i’m going to start with this one: if momma ain’t happy, ain’t nobody happy. | and if we just take that and we build from there, then we can go to the next step, which is that if the ocean ain’t happy, ain’t nobody happy. | and if we just take that and we build from there, then we can go to the next step, which is that if the ocean ain’t happy, ain’t nobody happy. |
Model | Talk Range for Similarity Measurement | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|
All Talks | Each Talk | Hybrid The Nth Sentence from the Front) | |||||||||
N = 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |||
IWSLT’14 En→De | 30.85 | 30.29 | 30.49 | 30.44 | 30.44 | 30.53 | 30.48 | 30.75 | 30.41 | 30.55 | 30.52 |
IWSLT’14 En→Es | 39.67 | 39.55 | 39.72 | 39.71 | 39.65 | 39.57 | 39.83 | 39.79 | 39.7 | 39.73 | 39.52 |
IWSLT’17 En→Fr | 43.91 | 44.05 | 43.55 | 43.88 | 43.67 | 43.63 | 43.69 | 43.58 | 43.82 | 43.76 | 43.73 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Choi, G.-H.; Shin, J.-H.; Lee, Y.-H.; Kim, Y.-K. Toward Understanding Most of the Context in Document-Level Neural Machine Translation. Electronics 2022, 11, 2390. https://doi.org/10.3390/electronics11152390
Choi G-H, Shin J-H, Lee Y-H, Kim Y-K. Toward Understanding Most of the Context in Document-Level Neural Machine Translation. Electronics. 2022; 11(15):2390. https://doi.org/10.3390/electronics11152390
Chicago/Turabian StyleChoi, Gyu-Hyeon, Jong-Hun Shin, Yo-Han Lee, and Young-Kil Kim. 2022. "Toward Understanding Most of the Context in Document-Level Neural Machine Translation" Electronics 11, no. 15: 2390. https://doi.org/10.3390/electronics11152390
APA StyleChoi, G. -H., Shin, J. -H., Lee, Y. -H., & Kim, Y. -K. (2022). Toward Understanding Most of the Context in Document-Level Neural Machine Translation. Electronics, 11(15), 2390. https://doi.org/10.3390/electronics11152390