Machine Translation: The Importance of Document-Level Evaluation

Research suggests that when it comes to evaluating entire documents, human translations are rated as more adequate and more fluent than machine translations. Human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. This suggests that the way machine translation is evaluated needs to evolve away from a system where machines consider each sentence in isolation.

en flag
fr flag
de flag
pt flag
es flag

Editor’s Note: Given the increased pulse rate of discussions about the use of machine translation in support of data and legal discovery tasks, the following information pieces may be beneficial for considering, contrasting, and comparing human translations and machine translations at both the sentence level and the document level.

Human Translators Are Still On Top – For Now

An extract from an article posted on MIT Technology Review

You may have missed the popping of champagne corks and the shower of ticker tape, but in recent months computational linguists have begun to claim that neural machine translation now matches the performance of human translators.

The technique of using a neural network to translate text from one language into another has improved by leaps and bounds in recent years, thanks to the ongoing breakthroughs in machine learning and artificial intelligence. So it is not really a surprise that machines have approached the performance of humans. Indeed, computational linguists have good evidence to back up this claim.

But today, Samuel Laubli at the University of Zurich and a couple of colleagues say the champagne should go back on ice. They do not dispute their colleagues’ results but say the testing protocol fails to take account of the way humans read entire documents. When this is assessed, machines lag significantly behind humans, they say.

Read the complete article at Human Translators Are Still On Top – For Now

Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation

An abstract from a research study by Samuel Laubli, Rico Sennrich, and Martin Volk

Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasize the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.

Read the complete study at Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation

In Human vs. Machine Translation, Compare Documents, Not Sentences

An extract from an article by Gino Dino

In their paper’s conclusion, Läubli, Sennrich, and Volk explain that NMT [Neural Machine Translation] is currently at a level of fluency where BLEU (bilingual evaluation understudy) scores based on a single model translation and even evaluations of non-professional human translators of sentence-level output are no longer enough.

“As machine translation quality improves, translations will become harder to discriminate in terms of quality, and it may be time to shift towards document-level evaluation, which gives raters more context to understand the original text and its translation,” the paper’s conclusion read. It further explained that document-level evaluation shows translation errors otherwise “invisible” in a sentence-level evaluation.

Read the complete article at In Human vs. Machine Translation, Compare Documents, Not Sentences

Additional Reading

Source: ComplexDiscovery