Editor’s Note: Given the increased pulse rate of discussions about the use of machine translation in support of data and legal discovery tasks, the following information pieces may be beneficial for considering, contrasting, and comparing human translations and machine translations at both the sentence level and the document level.
Human Translators Are Still On Top – For Now
An extract from an article posted on MIT Technology Review
You may have missed the popping of champagne corks and the shower of ticker tape, but in recent months computational linguists have begun to claim that neural machine translation now matches the performance of human translators.
The technique of using a neural network to translate text from one language into another has improved by leaps and bounds in recent years, thanks to the ongoing breakthroughs in machine learning and artificial intelligence. So it is not really a surprise that machines have approached the performance of humans. Indeed, computational linguists have good evidence to back up this claim.
But today, Samuel Laubli at the University of Zurich and a couple of colleagues say the champagne should go back on ice. They do not dispute their colleagues’ results but say the testing protocol fails to take account of the way humans read entire documents. When this is assessed, machines lag significantly behind humans, they say.
Has Machine Translation Achieved Human Parity? A Case for Document-Level Evaluation
An abstract from a research study by Samuel Laubli, Rico Sennrich, and Martin Volk
Recent research suggests that neural machine translation achieves parity with professional human translation on the WMT Chinese–English news translation task. We empirically test this claim with alternative evaluation protocols, contrasting the evaluation of single sentences and entire documents. In a pairwise ranking experiment, human raters assessing adequacy and fluency show a stronger preference for human over machine translation when evaluating documents as compared to isolated sentences. Our findings emphasize the need to shift towards document-level evaluation as machine translation improves to the degree that errors which are hard or impossible to spot at the sentence-level become decisive in discriminating quality of different translation outputs.
In Human vs. Machine Translation, Compare Documents, Not Sentences
An extract from an article by Gino Dino
In their paper’s conclusion, Läubli, Sennrich, and Volk explain that NMT [Neural Machine Translation] is currently at a level of fluency where BLEU (bilingual evaluation understudy) scores based on a single model translation and even evaluations of non-professional human translators of sentence-level output are no longer enough.
“As machine translation quality improves, translations will become harder to discriminate in terms of quality, and it may be time to shift towards document-level evaluation, which gives raters more context to understand the original text and its translation,” the paper’s conclusion read. It further explained that document-level evaluation shows translation errors otherwise “invisible” in a sentence-level evaluation.
- Designing Delegation to Optimize Artificial Intelligence and Human Interaction
- Focusing on Artificial Intelligence? Seven Questions For Vendors With AI Products?