1. Introduction
One of the biggest challenges for machine translation (MT) currently is to handle creative texts, such as literature, marketing content, etc., as these text types tend to contain a large amount of non-literal language, such as sarcasm, metaphor, irony, and ambiguous elements of language that are likely to result in a word-by-word translation, thus compromising the rendering of the source text in the target language [
1]. However, with the advent of neural MT systems (NMT), researchers in the field of artificial intelligence have identified a window of opportunity to translate creative texts more efficiently [
2,
3], as NMT systems are reported to outperform their predecessor, statistical MT systems, because they are able to learn the similarity between words and consider the context of the entire sentence, rather than just n-grams [
4].
While a number of studies (e.g., [
2,
3,
5]) have investigated whether post-editing the MT output for literature might help literary translators in terms of productivity, translators’ perception of MT is that the system is less useful for creative texts [
5] than for other text types. In accordance with this, Guerberof–Arenas and Toral [
1], while attempting to quantify creativity in MT and post-edited (PE) literary texts, investigated whether the translation modes impact the reader experience. Their study has shown that human translation (HT) scores higher for creativity than PE translations, although for reading experiences related to emotional engagement and narrative presence, no statistically significant differences between HT, MT, and PE have been found. These results suggest that MT might have just started to become a tool to be considered when translating creative texts, but it is still an open question whether there are characteristics typical of PE literary texts and whether these characteristics possibly make them less creative than HT texts. For that reason, more research on the MT output and post-editing for this textual domain is necessary.
In this work, we focused on the quest for the typical features of PE literary texts and the differences between PE texts from other comparable translated texts (MT and HT) for the detection of post-editese features [
6]. We believe that researching the features of the PE literary texts and contrasting them with HT texts, the raw MT output and their source texts will allow us to obtain a better understanding of the processes involved during the PE task and the influence of technology on the translation product of literary texts. In addition, we believe that awareness of these features can inform translators regarding the challenges they will face when using technology for translating creative texts.
According to Chesterman [
7], the search for universal patterns in translation lies in two categories: (i) the search for universal patterns in translations through the comparison of features extracted from translated texts with features extracted from their source texts, as well as ii) the search for patterns in translations through comparisons of features extracted from translations and comparable (i.e., same text genre) non-translations in the same language. Chesterman [
7] calls the search for universal patterns in translations using source texts as
S-universals (S for source) and the search for universal patterns in translations using comparable non-translations
T-universals (T for target). As our quest for the post-editese phenomenon involves capturing the differences between PE texts from other comparable translated texts (MT and HT), we focus on the quest for the features that have been associated in the literature [
8,
9,
10,
11] with the hypothetical T-universal features, namely, simplification, explicitation, and convergence.
The idea of T-universals is also associated with the idea of translationese [
12], which is the term used to refer to the language typical of translated texts that causes strangeness in the readers. Thus, in the present study, we adopted the term translationese features when examining the T-universals. Following the rationale behind the extraction and analyses of the translationese features as described by Baker [
8], linguistic features were extracted from two literary texts, namely, the English children’s novel by Lewis Carroll
Alice’s Adventures in Wonderland and Paula Hawkins’ popular book
The Girl on the Train, using a set of computational analyses with the purpose of identifying the existence of post-editese, i.e., features that are typical of PE texts. All features extracted from our corpus were compared between the HT version of the source text, the MT version of the source text and nine (9) PE versions of the same MT output. As all translated versions originate from the same source text, we also extracted features from the source text as a reference data source.
Before presenting our methodology and the results of our experiments in detail, the next section presents an overview of the research in the field of translation studies addressing the features of translated texts as opposed to non-translated texts, as well as recent research focusing on the quest for post-editese features.
2. The Phenomenon of Post-Editese
In the field of translation studies, results of a number of research papers, e.g., [
13,
14,
15,
16,
17], have shown that translated texts are statistically different from texts originally written in a certain language. Research has shown, for instance, that translated texts present less varied vocabulary and simpler syntax as reflected by lower type-token, i.e., lower lexical richness, and shorter mean sentence length than original texts [
15,
18,
19]. Research has also shown that translated texts tend to be more similar to each other than non-translated texts [
19]. These differences are the product of the translation process that produces an interlanguage, the so-called translationese, that is, the language typical of translated texts [
12], regardless of the source and target languages. According to Volansky et al. [
15], the translationese phenomenon is the product of two coexisting forces that translators have to cope with during the translation process: the fidelity to the source text and the fluency in the target language. These two forces result in the strangeness of translated texts; that is, they result in the translationese phenomenon.
Inspired by Toury’s [
20] norms of translation, Baker [
8,
9] proposed to investigate the linguistic and stylistic features of translated texts by looking for universal patterns that distinguish translated texts from non-translated texts using comparable corpora, naming these universal patterns as translation universals. Therefore, translation universals are hypotheses of linguistic features common to all translated texts regardless of the source and target languages. The hypothetical universal features in translations proposed by Baker are simplification, explicitation, normalization (or conservatism), and leveling out (or convergence, as named by Corpas et al. [
13]).
The set of hypotheses raised by Baker [
8] on the characteristics common to all translated texts have aroused the interest of several researchers in the field of translation studies to investigate whether translationese features are manifested on the surface of translated texts. More recently, as the increased need for translation productivity in a globalized society resulted in the post-editing of the MT output, a number of studies, e.g., [
6,
21,
22,
23], from the natural language processing and MT fields have been discussing and investigating whether there are universal patterns typical of PE texts. Hence, the focus of attention has shifted from the typical features of HT texts to the typical features of PE texts.
Within the literature on translationese features, although several studies have shown that computers can distinguish, to a high degree of accuracy, between translations and originals [
13,
15,
16,
24], it is still unclear whether the same differences can be found between HT and PE texts. In contrast, the literature in the field of MT has shown some evidence that there might be differences between MT output and its PE version and HT texts. Several studies have shown, for instance, that the MT output differs from HT texts in terms of lexical variety. Vanmassenhove et al. [
25] found that current MT systems processes cause a general loss in terms of lexical diversity and richness when compared to HT. Thus, this loss in vocabulary range in the MT output may influence the product of PE translations, resulting consequently in differences between PE and HT texts.
Another example is the study from Culo and Nitzke [
26] who found that terminology of PE texts is closer to MT output than to HT. The work of Groves and Schmidtke [
27] also provides a clue to the existence of the post-editese phenomenon. The researchers compared the raw MT output produced by Microsoft’s Treelet MT engine [
28] with its PE counterpart, for English–German and English–French. They found that, in the English-German corpus, there were many cases of changes in case and gender of nouns, removal of commas and pronouns such as the German pronoun
sie, and insertion of the determiner
die. Similarly, in the English–French corpus, they found edits involving the deletion and insertion of the French function word
de. Stylistic changes were also observed, such as changes in words with the same meaning. The edits common to both corpora were: edits involving punctuation with removal or insertion of commas, changes in part-of-speech (determiners) and other structural changes; adjuncts and prepositional phrases; and, in a smaller proportion, changes in terminology.
Despite the studies evidencing differences between MT output and PE texts and PE texts and HT texts, the study by Daems et al. [
6] did not find evidence for the existence of post-editese. It was in this paper that the term post-editese was introduced, which the researchers define as “the expected unique characteristics of a PE text that set it apart from a translated text”. The study investigated whether humans are able to distinguish PE from HT texts, and whether a supervised machine learning model could distinguish HT from PE texts. The results showed that neither humans nor the machine could distinguish between the translation modalities.
Contrary to the results reported by Daems et al. [
6], Castilho et al. [
22] found evidence for the existence of post-editese while investigating the features of PE texts in a corpus composed of HT, MT, and PE texts in two domains: news and literature. The authors also tested whether the PE level, the translators’ experience, as well as the text domains influence the magnitude of the post-editese features. To this end, professional translators, and student translators post-edited the MT outputs of two different domains, namely news and literature, in the two different levels of post-editing: full PE, in which more modifications were allowed, and light PE, in which translators were asked to use as much of the MT output as possible. The results revealed evidence of post-editese features as PE texts were found to be more similar to the raw MT output and source texts rather than to the HT texts.
Toral [
21] also found evidence for the manifestation of post-editese in PE texts. The author investigated the post-editese phenomenon using a set of computational analyses of a corpus composed of several datasets containing HTs and the PE texts, including different language directions and domains. The author found that the PE texts are simpler and have a higher degree of interference from the source language than HT.
Considering this unclear scenario showing mixed results which leaves room for further discussion, in this article, we investigated the features of PE literary texts by comparing the features extracted from them with the features extracted from the raw MT output and the HT version of the source texts. As outlined previously, since the phenomenon has been found by several studies, we hypothesized here that the post-editese phenomenon would be found on the surface of PE literary texts as well, although manifested differently when compared to the translationese phenomenon emerging from HT texts.
Inspired by Gellerstam’s [
12] definition of translationese, we defined in the present study post-editese as follows:
Post-editese is the difference between the characteristics of human-translated texts (HT) and the post-edited (PE) versions, in relation to the raw MT output.
We proposed to extract and analyze a series of linguistic features that have come to define the post-editese phenomenon in MT, that is, the unique characteristics of PE texts that set them apart from HT texts. Our quest for post-editese features in literary texts is guided by an overarching research question:
To answer that, we use the rationale behind three translationese features as described by Baker [
8], namely, simplification, explicitation and convergence. Thus, two sub-questions are posed:
Based on the results encountered in the literature, we hypothesized that post-editese would be manifested as PE texts being closer to the MT output and source texts than HT texts are from either source texts or MT output. If we confirmed our hypothesis, i.e., if we observed differences in features between the PE and HT texts, then we assumed we have evidence for the existence of the post-editese phenomenon. Moreover, due to the difference in the genre of the two book excerpts (see
Section 3.1), we hypothesized that the degree of these differences would vary between the PE and HT from these two excerpts, where one would require more edits than the other.
In the next subsections, we present the translationese features that are addressed in the present study. The examination of these features along with findings reported in the post-editese literature [
6,
21,
22] guide our experiments and analysis. Based on the results of our experiments, we discuss how our study can contribute to the quality of post-edited literary texts.
5. General Discussion and Conclusions
In the present study, we investigated the existence of post-editese features in a corpus composed of excerpts from two different literary books: Alice’s Adventures in Wonderland and The Girl on the Train. While the former contains a rich language style as the author plays on words, introducing puns, metaphors, the latter contains simple and relatively straightforward language where action and emotion prevail over the author’s writing style.
To answer our RQ1 “Are the PE versions closer to the HT or to the MT and source in terms of the translationese features?”, we used the rationale behind the hypothetical features described by Baker [
7] namely simplification, explicitation, and convergence. Examining these features allowed us to investigate the differences between the HT and the PE versions to investigate the existence of post-editese phenomenon.
Table 18 shows a summary of our findings.
Regarding simplification, from
Table 18 we see that the post-editese hypothesis was not supported for the lexical richness (LR), sentence count (SC), sentence length (SL), or punctuation. Statistically significant differences between PE and HT texts are only observed for lexical density feature in TGOTT dataset. Thus, for the post-editese hypothesis, we found that, indeed, PE texts were different from the HT text. We also confirmed that the PE versions were closer to the MT versions, but in the TGOTT dataset only. However, the same was not true for the AW dataset as we did not find statistically significant differences between PE and HT texts in any of the simplification features examined.
Regarding sentence count, the post-editese hypothesis was not confirmed for either dataset, as we found that the PE texts were similar to the HT texts. Finally, regarding punctuation, the qualitative analysis revealed that the HT punctuation differs from the source punctuation both in TGOTT and AW, as punctuations were used by translators to split sentences and simplify the text. We also noted that punctuation in PE tends to follow the MT punctuation more closely than the HT. However, we did not confirm the post-editese hypothesis for the punctuation feature as significant differences in punctuation counts were not found between the text types in any of the comparisons made.
Taking the results of simplification into consideration, our findings showed a mixture of results as some simplification features were confirmed only for TGOTT, but none for the AW. Thus, regarding the question of whether they are good features to support the post-editese hypothesis (RQ2 “
which translationese features (as described by Baker [
8]
) can also support the post-editese hypothesis?”) our findings showed that, lexical richness, sentence length, sentence count, and punctuation might be good indicators of the existence of post-editese, but they are not good indicators of the existence of post-editese in our corpus.
As regards to the explicitation features, post-editese is confirmed for both features, i.e., length ratio and PPs ratio for the TGOTT dataset, but not for the AW dataset. Taking the results of explicitation into consideration, we can answer (RQ2) that length ratio and personal pronoun ratio were good indicators of the existence of post-editese hypothesis, but there was a difference between text sub-genres.
Finally, the convergence feature confirmed post-editese for all features since we observed that PE variance scores are similar to MT variance in both datasets. Thus, convergence in our study is a good indicator of post-editese (RQ2).
Considering the results of all features together, we note that post-editese was not confirmed for most of the features within the AW dataset, but it was confirmed for more features in the TGOTT dataset. Nonetheless, our findings showed that the post-editese phenomenon was manifested on the surface of the post-edited texts as there were differences between those and HT versions. These differences were manifested in terms of the proximity or distance from the source and MT versions. While PE texts from the TGOTT dataset were closer to the MT output in a series of features, the features extracted from the HT texts were more distant from the source and MT versions.
The major contribution of this work is the answer to our overarching research question “What are the characteristics of the PE literary texts?”. Our findings demonstrate that there was a clear difference between the literary genres: While literary texts whose author’s style is full of figurative language pose a harder challenge to the MT system, texts that emphasize action over language style are less challenging. We validated this assumption based on our observations that AW involved more edits than the TGOTT dataset, suggesting that the MT output can express the meaning of the source text more efficiently than for the AW. Moreover, we found a more visible pattern in terms of features for the TGOTT dataset when compared to the AW which, in turn, was unstable in terms of pattern manifestation. This allowed us to confirm our post-editese hypothesis for some features in the TGOTT but for none in the AW. Thus, based on our results, the main characteristics of PE literary texts were that they were similar to the MT output in terms of lexical density, use of pronouns and sentence length. However, this scenario can be blurred by the sub-genre of the literary text.
Further analysis in the different literary genres is necessary in order to answer our research question more comprehensively, and so, the question of whether there are characteristics of PE literary texts that possibly make them less creative than HT texts remain open. Nevertheless, based on our results, we assumed that literary creativity in PE texts may be compromised, as shown by the Guerberof–Arenas and Toral [
1], due to the influence of the MT lexical and syntactic choices on the translators’ choices. As seen, the MT output performs a translation that tends to be as equivalent as possible to their source texts. It is possible that when post-editing the raw MT output translators are primed by the MT choices even though they were instructed to change the text to achieve a high-quality translation for publication standards, thus resulting in a PE text similar to the MT output. Consequently, this effect pushes them to converge to an equivalence with both the MT output, resulting in the manifestation of similar features and in the distortion of the writer’s language style. At the same time, this result may also indicate that NMT systems are achieving good quality literary translations, especially for literary texts in which action prevail over the author’s style, as translators did not need to interfere in the MT output in a great extent in order to obtain a high standard translation comparable to a high standard human translation.
Altogether, our results show that, when post-editing, translators should be aware of the priming effect of the raw MT output on their lexical and syntactic choices. The PE proximity to the MT output may result in distortion of the writers’ style, consequently, influencing the final product of the post-edited texts. This is therefore the major challenge translators face when post-editing literary texts.
It is noteworthy that we are aware of the limitations of our study. Although all translators were professional, with more than 2 years of experience, not all of them were literary translators, meaning they would not be as experienced in effectively commanding the tone, author’s style, and creativity when modifying the MT output to adapt into the linguistic framework of the target language. Since we can assume that Google translate provided ‘good’ quality translation based on the (h)TER scores (seen by the number of low edits), these translators who were not experienced with literary texts could have accepted the MT output and kept most of the system’s lexical and syntactic choices, resulting in fewer differences.
Another limitation is that the PET tool used for the translation might have restricted and biased the translators in not using the 1-to-many or many-to-1 option, that is, splitting or joining sentences, even though the guidelines allowed translators to do that. We speculate that, perhaps, if the translation task was set up in a word processor file, translators would feel freer to split/join sentences, and it would have given us different results. Finally, our study dealt with an unbalanced number of translated versions, with nine post-edited texts but only one human translation text. This unbalanced dataset could have biased the results for the feature convergence as we combined 9 PE texts, while for the source, HT, and MT, we computed the variance scores within the set of translated sentences of each translated version. Thus, a more balanced dataset with more human translations from the same text would provide us more data that could allow us to run robust statistical analysis providing, consequently, more evidence for the existence or not of the post-editese phenomenon.
Therefore, with further study in the literary genres and post-editese, we will be able to collect more characteristics of PE literary texts which will be relevant to inform translators regarding other challenges they will face when using technology for translating different creative texts.