UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study
Abstract
:1. Introduction
2. Parallel Corpus Acquisition
- “National Institute of Korean Language’s Learner Dictionary (https://krdict.korean.go.kr/eng/mainAction) (NIKLLD)” is an online dictionary containing definition statements of words in Korean and ten other languages, including English and Vietnamese.
- “Watchtowers and Awake! (https://www.jw.org/en/publications/magazines)”, where “Watchtowers” is a monthly online magazine and “Awake!” is a bi-weekly online magazine.
- “Books and Brochures for Bible Study (https://www.jw.org/en/publications/books)” contains multilingual books in PDF format.
- “Danuri portal of information about Korean life” (https://www.liveinkorea.kr) contains a guide for living in Korea (HTML format) and a serial of multicultural magazines—Rainbow (https://www.liveinkorea.kr/app/rainbow/webzineList.do) (PDF format).
- Online News and Korean-learning websites that contain Korean, English, and Vietnamese texts available in HTML format.
3. The Parallel Corpora Analysis with UTagger
3.1. Korean Morphology Analysis
3.2. Structure of Pre-Analysis Partial Eojeol Dictionary
3.3. Utilizing the Pre-Analysis Partial Eojeol Dictionary
3.4. Using Sub-Word Conditional Probability
3.5. Knowledge-Based Approach for WSD
3.6. Korean Morphological Analysis and WSD System: UTagger
4. Applying Morphological Analysis and Word-Sense Annotation to UPC
4.1. Korean-English Parallel Corpus
4.2. Korean-Vietnamese Parallel Corpus
5. Technical Validation
5.1. Experimentation
- Baseline: Utilizes the initial Korean, English, and Vietnamese sentences in the UPC, shown in Table 8. Korean texts were denoted “initial”, but they were normalized and tokenized using the Moses tokenizer. English and Vietnamese texts were also normalized and tokenized using the Moses tokenizer and converted to lowercase.
- Word-sense Ann.: Utilizes the Korean sentences after using UTagger (i.e., Korean morphological analysis and word-sense annotation), the English and Vietnamese sentences are the same forms of the baseline systems.
5.2. Experimental Results
6. Conclusions
Author Contributions
Funding
Conflicts of Interest
References
- Koehn, P.; Hoang, H.; Birch, A.; Callison-Burch, C.; Federico, M.; Bertoldi, N.; Cowan, B.; Shen, W.; Moran, C.; Zens, R.; et al. Moses: Open Source Toolkit for Statistical Machine Translation. In Proceedings of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Stroudsburg, PA, USA, 25–27 June 2007; Association for Computational Linguistics: Stroudsburg, PA, USA, 2007; pp. 177–180. [Google Scholar]
- Dyer, C.; Lopez, A.; Ganitkevitch, J.; Weese, J.; Ture, F.; Blunsom, P.; Setiawan, H.; Eidelman, V.; Resnik, P. cdec: A Decoder, Alignment, and Learning Framework for Finite-State and Context-Free Translation Models. In Proceedings of the ACL 2010 System Demonstrations, Uppsala, Sweden, 11–16 July 2010; Association for Computational Linguistics: Uppsala, Sweden, 2010; pp. 7–12. [Google Scholar]
- Green, S.; Cer, D.; Manning, C. Phrasal: A Toolkit for New Directions in Statistical Machine Translation. In Proceedings of the Ninth Workshop on Statistical Machine Translation, Baltimore, MD, USA, 26–27 June 2014; Association for Computational Linguistics: Baltimore, MD, USA, 2014; pp. 114–121. [Google Scholar]
- Klein, G.; Kim, Y.; Deng, Y.; Senellart, J.; Rush, A. OpenNMT: Open-Source Toolkit for Neural Machine Translation. In Proceedings of the ACL 2017, System Demonstrations, Vancouver, BC, Canada, 30 July–4 August 2017; Association for Computational Linguistics: Vancouver, BC, Canada, 2017; pp. 67–72. [Google Scholar]
- Sennrich, R.; Firat, O.; Cho, K.; Birch, A.; Haddow, B.; Hitschler, J.; Junczys-Dowmunt, M.; Läubli, S.; Miceli Barone, A.V.; Mokry, J.; et al. Nematus: A Toolkit for Neural Machine Translation. In Proceedings of the Software Demonstrations of the 15th Conference of the European Chapter of the Association for Computational Linguistics, Valencia, Spain, 3 April 2017; Association for Computational Linguistics: Valencia, Spain, 2017; pp. 65–68. [Google Scholar]
- Vaswani, A.; Bengio, S.; Brevdo, E.; Chollet, F.; Gomez, A.; Gouws, S.; Jones, L.; Kaiser, Ł.; Kalchbrenner, N.; Parmar, N.; et al. Tensor2Tensor for Neural Machine Translation. In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), Boston, MA, USA, 17–21 March 2018; Association for Machine Translation in the Americas: Boston, MA, USA, 2018; pp. 193–199. [Google Scholar]
- Koehn, P. Europarl: A Parallel Corpus for Statistical Machine Translation. In Proceedings of the Conference Proceedings: The Tenth Machine Translation Summit, Phuket, Thailand, 28–30 September 2005; AAMT: Phuket, Thailand, 2005; pp. 79–86. [Google Scholar]
- Steinberger, R.; Pouliquen, B.; Widiger, A.; Ignat, C.; Erjavec, T.; Tufiş, D.; Varga, D. The JRC-Acquis: A Multilingual Aligned Parallel Corpus with 20+ Languages. In Proceedings of the Fifth International Conference on Language Resources and Evaluation (LREC’06), Genoa, Italy, 20–22 May 2006; European Language Resources Association (ELRA): Genoa, Italy, 2006. [Google Scholar]
- Alex, R.; Dale, R. United Nations general assembly resolutions: A six-language parallel corpus. In Proceedings of the MT Summit, Ottawa, ON, Canada, 26–30 August 2009; pp. 292–299. [Google Scholar]
- Lee, J.; Lee, D.; Lee, G.G. Improving phrase-based Korean-English statistical machine translation. In Proceedings of the Ninth International Conference on Spoken Language Processing (INTERSPEECH 2006), Pittsburgh, PA, USA, 17–21 September 2006. [Google Scholar]
- Hong, G.; Lee, S.-W.; Rim, H.-C. Bridging Morpho-Syntactic Gap between Source and Target Sentences for English-Korean Statistical Machine Translation. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers, Suntec, Singapore, 4 August 2009; Association for Computational Linguistics: Suntec, Singapore, 2009; pp. 233–236. [Google Scholar]
- Chung, T.; Gildea, D. Unsupervised Tokenization for Machine Translation. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–7 August 2009; Association for Computational Linguistics: Singapore, 2009; pp. 718–726. [Google Scholar]
- Kim, H. Korean National Corpus in the 21st Century Sejong Project. In Proceedings of the 13th National Institute for Japanese Language International Symposium, Tokyo, Japan, 15–18 January 2006; pp. 49–54. [Google Scholar]
- Park, J.; Hong, J.-P.; Cha, J.-W. Korean Language Resources for Everyone. In Proceedings of the 30th Pacific Asia Conference on Language, Information and Computation: Oral Papers, Seoul, Korea, 16–19 October 2016; Association for Computational Linguistics: Seoul, Korea, 2016. [Google Scholar]
- Tiedemann, J. OPUS—Parallel Corpora for Everyone. Balt. J. Mod. Comput. 2016, 4, 384. [Google Scholar]
- Tan, L.; Bond, F. Building and Annotating the Linguistically Diverse NTU-MC (NTU-Multilingual Corpus). In Proceedings of the 25th Pacific Asia Conference on Language, Information and Computation, Singapore, 2–5 December 2011; Institute of Digital Enhancement of Cognitive Processing, Waseda University: Singapore, 2011; pp. 362–371. [Google Scholar]
- Nguyen, Q.-P.; Ock, C.-Y.; Shin, J.-C. Korean Morphological Analysis for Korean-Vietnamese Statistical Machine Translation. J. Electron. Sci. Technol. 2017, 15, 413–419. [Google Scholar] [CrossRef]
- Dinh, D.; Kim, W.J.; Diep, D. Exploiting the Korean—Vietnamese Parallel Corpus in teaching Vietnamese for Koreans. In Proceedings of the Interdisciplinary Study on Language Communication in Multicultural Society, the Int’l Conf. of ISEAS/BUFS, Busan, Korea, 25–27 May 2017. [Google Scholar]
- Vintar, Š.; Fišer, D. Using WordNet-Based Word Sense Disambiguation to Improve MT Performance. In Hybrid Approaches to Machine Translation; Costa-jussà, M.R., Rapp, R., Lambert, P., Eberle, K., Banchs, R.E., Babych, B., Eds.; Theory and Applications of Natural Language Processing; Springer International Publishing: Cham, Switzerland, 2016; pp. 191–205. ISBN 978-3-319-21311-8. [Google Scholar]
- Sudarikov, R.; Dušek, O.; Holub, M.; Bojar, O.; Kríž, V. Verb sense disambiguation in Machine Translation. In Proceedings of the Sixth Workshop on Hybrid Approaches to Translation (HyTra6), Osaka, Japan, 11 December 2016; The COLING 2016 Organizing Committee: Osaka, Japan, 2016; pp. 42–50. [Google Scholar]
- Pu, X.; Pappas, N.; Popescu-Belis, A. Sense-Aware Statistical Machine Translation using Adaptive Context-Dependent Clustering. In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, 7–8 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 1–10. [Google Scholar]
- Xiong, D.; Zhang, M. A Sense-Based Translation Model for Statistical Machine Translation. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Baltimore, MD, USA, 22–27 June 2014; Association for Computational Linguistics: Baltimore, MD, USA, 2014; pp. 1459–1469. [Google Scholar]
- Neale, S.; Gomes, L.; Agirre, E.; de Lacalle, O.L.; Branco, A. Word Sense-Aware Machine Translation: Including Senses as Contextual Features for Improved Translation Models. In Proceedings of the Tenth International Conference on Language Resources and Evaluation (LREC’16), Portorož, Slovenia, 23–28 May 2016; European Language Resources Association (ELRA): Portorož, Slovenia, 2016; pp. 2777–2783. [Google Scholar]
- Marvin, R.; Koehn, P. Exploring Word Sense Disambiguation Abilities of Neural Machine Translation Systems (Non-archival Extended Abstract). In Proceedings of the 13th Conference of the Association for Machine Translation in the Americas (Volume 1: Research Papers), Boston, MA, USA, 17–21 March 2018; Association for Machine Translation in the Americas: Boston, MA, USA, 2018; pp. 125–131. [Google Scholar]
- Liu, F.; Lu, H.; Neubig, G. Handling Homographs in Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, 1–6 June 2018; Association for Computational Linguistics: New Orleans, Louisiana, 2018; pp. 1336–1345. [Google Scholar]
- Rios Gonzales, A.; Mascarell, L.; Sennrich, R. Improving Word Sense Disambiguation in Neural Machine Translation with Sense Embeddings. In Proceedings of the Second Conference on Machine Translation, Copenhagen, Denmark, 7–8 September 2017; Association for Computational Linguistics: Copenhagen, Denmark, 2017; pp. 11–19. [Google Scholar]
- Nguyen, T.; Chiang, D. Improving Lexical Choice in Neural Machine Translation. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, Louisiana, 1–6 June 2018; Association for Computational Linguistics: New Orleans, Louisiana, 2018; pp. 334–343. [Google Scholar]
- Su, J.; Xiong, D.; Huang, S.; Han, X.; Yao, J. Graph-Based Collective Lexical Selection for Statistical Machine Translation. In Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, Lisbon, Portugal, 17–21 September 2015; Association for Computational Linguistics: Lisbon, Portugal, 2015; pp. 1238–1247. [Google Scholar]
- Nguyen, Q.-P.; Vo, A.-D.; Shin, J.-C.; Ock, C.-Y. Effect of Word Sense Disambiguation on Neural Machine Translation: A Case Study in Korean. IEEE Access 2018, 6, 38512–38523. [Google Scholar] [CrossRef]
- Papineni, K.; Roukos, S.; Ward, T.; Zhu, W.-J. Bleu: A Method for Automatic Evaluation of Machine Translation. In Proceedings of the 40th Annual Meeting of the Association for Computational Linguistics, Philadelphia, PA, USA, 6–9 July 2002; Association for Computational Linguistics: Philadelphia, PA, USA, 2002; pp. 311–318. [Google Scholar]
- Snover, M.; Dorr, B.; Schwartz, R.; Micciulla, L.; Makhoul, J. A Study of Translation Edit Rate with Targeted Human Annotation. In Proceedings of the Association for Machine Translation in the Americas, Cambridge, MA, USA, 8–12 August 2006; p. 9. [Google Scholar]
- Kim, D.-B.; Lee, S.-J.; Choi, K.-S.; Kim, G.-C. A Two-Level Morphological Analysis of Korean. In Proceedings of the COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics, Kyoto, Japan, 5–9 August 1994; Association for Computational Linguistics: Kyoto, Japan, 1994; pp. 535–539. [Google Scholar]
- Kang, S.-S.; Kim, Y.T. Syllable-Based Model for the Korean Morphology. In Proceedings of the COLING 1994 Volume 1: The 15th International Conference on Computational Linguistics, Kyoto, Japan, 5–9 August 1994; Association for Computational Linguistics: Kyoto, Japan, 1994; pp. 221–226. [Google Scholar]
- Min, J.; Jeon, J.-W.; Song, K.-H.; Kim, Y.-S. A Study on Word Sense Disambiguation Using Bidirectional Recurrent Neural Network for Korean Language. J. Korea Soc. Comput. Inf. 2017, 22, 41–49. [Google Scholar]
- Kang, M.Y.; Kim, B.; Lee, J.S. Word Sense Disambiguation Using Embedded Word Space. J. Comput. Sci. Eng. 2017, 11, 32–38. [Google Scholar] [CrossRef] [Green Version]
- Minho, K.; Hyuk-Chul, K. Word sense disambiguation using semantic relations in Korean WordNet. J. KIIS Softw. Appl. 2011, 38, 554–564. [Google Scholar]
- Kang, S.; Kim, M.; Kwon, H.; Jeon, S.; Oh, J. Word Sense Disambiguation of Predicate using Sejong Electronic Dictionary and KorLex. KIISE Trans. Comput. Pract. 2015, 21, 500–505. [Google Scholar] [CrossRef]
- Yoon, A.S. Korean WordNet, KorLex 2.0—A Language Resource for Semantic Processing and Knowledge Engineering. HG 2012, 295, 163. [Google Scholar] [CrossRef]
- Young-Jun, B.; Cheol-Young, O. Introduction to the Korean Word Map (UWordMap) and API. In Proceedings of the 26th Annual Conference on Human and Language Technology, Gangwon, Korea, 18–20 December 2014; pp. 27–31. [Google Scholar]
- Na, S.-H.; Kim, Y.-K. Phrase-Based Statistical Model for Korean Morpheme Segmentation and POS Tagging. IEICE Trans. Inf. Syst. 2018, E101.D, 512–522. [Google Scholar] [CrossRef] [Green Version]
- Jung, S.; Lee, C.; Hwang, H. End-to-End Korean Part-of-Speech Tagging Using Copying Mechanism. ACM Trans Asian Low-Resource Lang. Inf. Process. 2018, 17, 1–8. [Google Scholar] [CrossRef]
- Matteson, A.; Lee, C.; Kim, Y.; Lim, H. Rich Character-Level Information for Korean Morphological Analysis and Part-of-Speech Tagging. In Proceedings of the 27th International Conference on Computational Linguistics, Santa Fe, NM, USA, 20–26 August 2018; Association for Computational Linguistics: Santa Fe, NM, USA, 2018; pp. 2482–2492. [Google Scholar]
- Shin, J.C.; Ock, C.Y. Korean Homograph Tagging Model based on Sub-Word Conditional Probability. KIPS Trans. Softw. Data Eng. 2014, 3, 407–420. [Google Scholar] [CrossRef] [Green Version]
- Phuoc, N.Q.; Quan, Y.; Ock, C.-Y. Building a Bidirectional English-Vietnamese Statistical Machine Translation System by Using MOSES. IJCEE 2016, 8, 161–168. [Google Scholar] [CrossRef] [Green Version]
- Nguyen, Q.-P.; Vo, A.-D.; Shin, J.-C.; Ock, C.-Y. Neural Machine Translation Enhancements through Lexical Semantic Network. In Proceedings of the 10th International Conference on Computer Modeling and Simulation—ICCMS 2018, Sydney, Australia, 8–10 January 2018; ACM Press: Sydney, Australia, 2018; pp. 105–109. [Google Scholar]
- Nguyen, Q.-P.; Vo, A.-D.; Shin, J.-C.; Tran, P.; Ock, C.-Y. Building a Korean-Vietnamese Neural Machine Translation System with Korean Morphological Analysis and Word Sense Disambiguation. IEEE Access 2019, 1–13. [Google Scholar] [CrossRef]
- Kinoshita, S.; Oshio, T.; Mitsuhashi, T. Comparison of SMT and NMT trained with large Patent Corpora: Japio at WAT2017. In Proceedings of the 4th Workshop on Asian Translation (WAT2017), Taipei, Taiwan, 27 November–1 December 2017; Asian Federation of Natural Language Processing: Taipei, Taiwan, 2017; pp. 140–145. [Google Scholar]
- Junczys-Dowmunt, M.; Dwojak, T.; Hoang, H. Is Neural Machine Translation Ready for Deployment? A Case Study on 30 Translation Directions. In Proceedings of the Ninth International Workshop on Spoken Language Translation (IWSLT), Seattle, WA, USA, 8–9 December 2016. [Google Scholar]
- Bentivogli, L.; Bisazza, A.; Cettolo, M.; Federico, M. Neural versus Phrase-Based Machine Translation Quality: A Case Study. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, Austin, TX, USA, 1–5 November 2016; Association for Computational Linguistics: Austin, TX, USA, 2016; pp. 257–267. [Google Scholar]
- Luong, T.; Sutskever, I.; Le, Q.; Vinyals, O.; Zaremba, W. Addressing the Rare Word Problem in Neural Machine Translation. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), Beijing, China, 26–31 July 2015; Association for Computational Linguistics: Beijing, China, 2015; pp. 11–19. [Google Scholar]
- Won-Kee, L.; Young-Gil, K.; Eui-Hyun, L.; Hong-Seok, K.; Seung-U, J.; Hyung-Mi, C.; Jong-Hyeok, L. Improve performance of phrase-based statistical machine translation through standardizing Korean allomorph. In Proceedings of the HCLT, Busan, Korea, 29 June–1 July 2016; pp. 285–290. [Google Scholar]
- Cho, S.-W.; Kim, Y.-G.; Kwon, H.-S.; Lee, E.-H.; Lee, W.-K.; Cho, H.-M.; Lee, J.-H. Embedded clause extraction and restoration for the performance enhancement in Korean-Vietnamese statistical machine translation. In Proceedings of the 28th Annual Conference on Human & Cognitive Language Technology, Busan, Korea, 7–8 October 2016; pp. 280–284. [Google Scholar]
Eojeol | Morphemes and POS | Meaning |
---|---|---|
ga-si-neun | ga/VV + si/EP + neun/ETM | to go (honorific form) |
gal/VV + si/EP + neun/ETM | to sharpen (honorific form) | |
ga-si/VV + neun/ETM | to disappear, vanish | |
ga-si/NNG + neun/JX | a prickle, thorn, or needle |
No. | Key | Value |
---|---|---|
1 | sa-lang-i-e-yo | sa-lang_01/NNG + i/VCP + e-yo/EF |
2 | sa-lang-i* | sa-lang_01/NNG + i/VCP |
3 | sa-lang | sa-lang_01/NNG |
4 | i-e-yo | i/VCP + e-yo/EF |
5 | *i-e-yo | e-yo/EF |
Order | Left | Right |
---|---|---|
1 | sa-lam-in-ga* | *ga |
2 | sa-lam-in | ga |
3 | sa-lam-in* | *in-ga |
4 | sa-lam | in-ga |
5 | sa-lam* | *lam-in-ga |
6 | sa | lam-in-ga |
7 | sa* | *sa-lam-in-ga |
sin-seon-han (fresh) | po-to-leul (grape) | meog-eoss-da (ate) |
: po-do_06/NNG + leul/JKO | ||
: po-do_07/NNG + leul/JKO |
Eojeol | Word Stem | Function Word |
---|---|---|
sseu-da | sseu_01/VV (to write) | n-da/EF |
sseu_02/VV (to wear) | n-da/EF | |
sseu_03/VV (to use) | n-da/EF | |
sseu_06/VA (bitter) | n-da/EF |
Predicate | Arguments | |
---|---|---|
Postpositional Particles | Nouns (LCS) | |
geod-da_02 (to walk) | eul | gil_0101 (street) geoli_0101 (avenue) gong-won_03 (park) |
geod-da_04 (to collect/ to gather) | eul | seong-geum_03 (donation) hoe-bi_03 (fee, dues) |
e-ge-seo | baeg-seong_0001 (subjects) | |
e-seo | si-heom-jang_0001 (exam place) jib_0101 (house) |
Approaches | Morphological Analysis | WSD |
---|---|---|
Phrase-based Statistical Model [40] | 96.35% | |
Recurrent Neural Network-based with Copying Mechanism [41] | 97.08% | |
Bi-Long Short-Term Memory (Bi-LSTM) [42] | 96.20% | |
Statistical-based [43] | 96.42% | |
Bidirectional Recurrent Neural Network [34] | 96.20% | |
Embedded Word Space [35] | 85.50% | |
UTagger | 98.2% | 96.52% |
Initial form | 눈에 미끄러져서 눈을 다쳤다. nun-e mi-kkeu-leo-jyeo-seo nun-eul da-chyeoss-da. (I slipped over the snow, and my eyes are injured.) |
Form after applying UTagger | 눈_04/NNG + 에/JKB 미끄러지/VV + 어서/EC 눈_01/NNG + 을/JKO 다치_01/VV + 었/EP + 다/EF + ./SF nun_04 + e/JBK mi-kkeu-leo-ji/VV + eo-seo/EC nun_01/NNG + eul/JKO da-chi_01/VV + eoss-da/EF ./SF |
No. | Initial | Morphological Analysis | |||
---|---|---|---|---|---|
Token/Voc. | Meaning | Form | Token | Voc. | |
1 | jib-e-seo | at home | jib/NNG e-seo/JKB | jib | jib |
e-seo | |||||
2 | jib-e | at home | jib/NNG e/JKB | jib | hag-gyo |
e | |||||
3 | hag-gyo-e-seo | at school | hag-gyo/NNG e-seo/JKB | hag-gyo | ga-ge |
e-seo | |||||
4 | hag-gyo-e | at school | hag-gyo/NNG e/JKB | hag-gyo | e-seo |
e | |||||
5 | ga-ge-e-seo | at store | ga-ge/NNG e-seo/JKB | ga-ge | e |
e-seo | |||||
6 | ga-ge-e | at store | ga-ge/NNG e/JKB | ga-ge | |
e |
#Sentences | #Avg. Length | #Tokens | #Vocabularies | ||
---|---|---|---|---|---|
Korean | Initial | 969,194 | 10.2 | 9,918,960 | 816,273 |
Morph. Ana. and WSD | 16.2 | 15,691,059 | 132,754 | ||
English | 13.0 | 12,291,207 | 347,658 |
#Sentences | #Avg. Length | #Tokens | #Vocabularies | ||
---|---|---|---|---|---|
Korean | Initial | 412,317 | 11.6 | 4,782,063 | 389,752 |
Morph. Ana. and WSD | 20.1 | 8,287,635 | 68,719 | ||
Vietnamese | 14.5 | 5,958,096 | 39,748 |
MT Systems | BLEU | TER | ||
---|---|---|---|---|
Korean-to-English | SMT | Original | 18.53 | 70.15 |
Word-sense Ann. | 24.21 | 65.81 | ||
NMT | Original | 21.28 | 68.19 | |
Word-sense Ann. | 27.45 | 60.03 | ||
English-to-Korean | SMT | Original | 19.18 | 69.89 |
Word-sense Ann. | 20.58 | 69.42 | ||
NMT | Original | 23.57 | 66.38 | |
Word-sense Ann. | 25.36 | 63.92 | ||
Korean-to-Vietnamese | SMT | Original | 20.69 | 68.94 |
Word-sense Ann. | 23.47 | 66.75 | ||
NMT | Original | 24.52 | 64.41 | |
Word-sense Ann. | 27.81 | 58.69 | ||
Vietnamese-to-Korean | SMT | Original | 10.13 | 71.58 |
Word-sense Ann. | 22.31 | 67.05 | ||
NMT | Original | 10.49 | 71.12 | |
Word-sense Ann. | 25.62 | 63.31 |
© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Vu, V.-H.; Nguyen, Q.-P.; Shin, J.-C.; Ock, C.-Y. UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study. Appl. Sci. 2020, 10, 3904. https://doi.org/10.3390/app10113904
Vu V-H, Nguyen Q-P, Shin J-C, Ock C-Y. UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study. Applied Sciences. 2020; 10(11):3904. https://doi.org/10.3390/app10113904
Chicago/Turabian StyleVu, Van-Hai, Quang-Phuoc Nguyen, Joon-Choul Shin, and Cheol-Young Ock. 2020. "UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study" Applied Sciences 10, no. 11: 3904. https://doi.org/10.3390/app10113904
APA StyleVu, V. -H., Nguyen, Q. -P., Shin, J. -C., & Ock, C. -Y. (2020). UPC: An Open Word-Sense Annotated Parallel Corpora for Machine Translation Study. Applied Sciences, 10(11), 3904. https://doi.org/10.3390/app10113904