It’s pretty widely known these days that Google Translate is far from perfect – its translations can have issues with accuracy and some pretty weird wording.
At other times though, it is so good it can’t be faulted.
This often perplexes people – why was it accurate last time, so inaccurate and clumsy this time?
Well, when you understand how Google Translate works – how it produces its translations – you’ll see these sort of quality issues are actually inevitable.
It couldn’t actually be any other way!
This article gives examples of the types of accuracy and quality problems experienced with Google Translate and explains how and why they occur.
This leads to suggestions as to when to use it and when to avoid it like the plague.
How Google Translate works – the source of its (in)accuracy problems
Many people are surprised to learn that Google doesn’t actually try to understand the meaning of each sentence as a whole, as a reader or human translator would.
Instead its translations are based purely on statistical occurrence.
Yep that’s right, Google Translate is all about maths and odds, not meaning!
Essentially Google has built up huge databases of existing translations produced over many years by human translators that they have adjudged to be reliable and accurate. Let’s hope they’re right on the reliability score, because this is the cornerstone for all their translations.
They have then developed a set of statistical matches. Basically they say that word or phrase x has been most frequently translated as word or phrase y in this language so that’s what we’ll always use for our translations, as it gives us the best odds of being correct.
So they break each sentence down into individual components (words or phrases) that they can find a match for in their database.
They then assess what the most frequently occurring translation for each component is, and string these together to create the translation of the sentence.
The problems with this approach are twofold:
- Google Translate has no way of knowing when what it produces simply doesn’t make sense – when the sum of the individual parts doesn’t add up to a coherent whole.
- Any time a word is used with a meaning other than the most frequently occurring meaning for that word, they’ll choose the more common meaning and the translation will be wrong.
Let’s look at two examples to illustrate this.
Here’s a French paragraph we translated into English a while back, firstly with Google’s translation and also Bing Translator’s for comparison.
|Pour la plupart des gens, jouer peut être amusant. Toutefois, certaines personnes, peuvent vite (ou avec le temps) perdre contrôle de leur jeu sans reconnaître les signes avant-coureurs d’avertissement qu’un problème est en train de naître.||For most people, playing can be fun. However, some people may soon (or time) to lose control of their game without recognizing the warning signs warning that a problem is being born.||For most people, play can be fun. However, some people can quickly (or time) lose control of their game without recognizing the warning signs of warning that a problem is being born.|
The translations have some issues with grammar and awkward wording, but generally you can get the gist.
You would assume the author is talking about play in general, and some problem that may arise from that.
Here’s how we translated the text:
Gambling can be fun for most people. However for some, losing control of their gambling can happen either quickly or over time, without recognising the early warning signs that there may be a problem developing.
So here we see the text is actually talking about “gambling”, not “play” and this was missed entirely by Bing and Google, thus giving a false impression of what the text is about and a rather deficient and inaccurate translation of the paragraph.
This arises because the most common meaning of the French verb jouer is “to play” and noun jeu is a “game”. But these words can also mean “to gamble” and “gambling” respectively, as in this case.
And look what happens when the same paragraph is run through these free online translation engines again but this time with a preceding heading:
|PROBLÈMES DE JEU|
Pour la plupart des gens, jouer peut être amusant. Toutefois, certaines personnes, peuvent vite (ou avec le temps) perdre contrôle de leur jeu sans reconnaître les signes avant-coureurs d’avertissement qu’un problème est en train de naître.
For most people, playing can be fun. However, some people may soon (or time) to lose control of their game without recognizing the warning signs warning that a problem is being born.
For most people, play can be fun. However, some people can quickly (or time) lose control of their game without recognizing the warning signs of warning that a problem is being born.
This is interesting because Google has got the correct translation for the heading – “problem gambling”. Bing hasn’t quite got it, as gambling problems isn’t really the same concept as problem gambling. But it has at least got the “gambling” part, so that’s promising.
This presumably shows that in Google’s database the phrase problèmes de jeu has most frequently been translated by human translators as “problem gambling”, whereas in Bing’s corpus of texts it has been more often rendered as “gambling problems”.
But notice that the following paragraphs in both cases have remained unchanged after adding this heading.
Despite now knowing that the text is really about gambling and not play in general, their translations still don’t translate jouer and jeu correctly for this context.
Human translators would never make this mistake as they take notice of context.
But Google and Bing aren’t programmed to do that, and instead continue to play their percentages game and therefore produce inaccurate translations.
This not taking into account what has gone before, ie the ignoring of the context in which words and phrases to be translated occur, cannot help but generate mistranslations such as these.
This is a clause from a set of Chilean government regulations.
|El semen, después de su recolección y hasta su despacho, fue conservado en contenedores exclusivos para la exportación a Chile o con semen que cumple como mínimo las mismas exigencias que las que se deben alcanzar para exportar a Chile y separado de cualquier otro semen.||The semen after collection until his office was preserved in unique containers for export to Chile or semen that fulfills at least the same requirements as those that must be met for exports to Chile and separate from any other semen.||Semen, after collection and until his office was preserved in unique containers for export to Chile or with semen that meets the same requirements as that to be achieved for exporting to Chile as a minimum and separated from any other semen.|
These translations also have some issues with grammar and awkward wording, but again you can get the general gist, particularly from the Bing translation.
But how does the reference to an office fit in? What office, and who is the “his” referring to?
Our human translation reveals the answer:
After collection and until its dispatch, the semen must have been kept in containers exclusively for export to Chile or with semen that at a minimum complies with the requirements for export to Chile, and apart from any other semen.
There is no office. One meaning of Spanish despacho is indeed office, and presumably that is the most common use of the word, but it also means “despatch”, which is the meaning intended here.
Again, this would be obvious to any Spanish reader and to any human translator, but both the Bing and Google Translations are inaccurate.
Also note the change of tense. The Google and Bing machine translations use the past tense as though this was an event that actually took place.
In contrast, the human translator uses a much more appropriate modal construction to express the obligation to do something.
How widespread are Google Translate accuracy issues?
In my observations it is reasonably common for Google Translate to produce some poor quality (awkward or unnatural) wording but that doesn’t stop you from still getting the gist of the text.
Once in a while though what they come up with just won’t make any sense at all.
As regards translation accuracy, bear in mind that Google doesn’t attempt to understand the meaning of a sentence as a whole, and ignores context. Instead they string bits of text together based purely on statistics.
Because of this, their translations will inevitably be wrong whenever a word appears with a meaning that isn’t the most common meaning of that word, as we’ve seen in our examples.
English homographs (words with the same written form but different meanings) number in the thousands. For example, James Hobbs identifies 2,133 homographs in common American usage, without being an exhaustive list (in Homophones and Homographs: An American Dictionary).
No doubt other languages are similar, so this is relatively commonplace.
A related issue is that a single word/concept may have to be translated in different ways in another language in different contexts.
A simple example is the Spanish financial term crédito which can be translated as either “credit” or “loan” in English. It’s essentially the same concept in Spanish, but in English it would get very confusing if we talked about a credit when we meant a loan.
I imagine though that Google and Bing will be programmed to always translate this word as “credit”.
Thus although a good percentage of Google and Bing translations are spot on, it is not uncommon for them to contain mistranslations and to give an inaccurate meaning.
How can I know if a particular Google translation is OK?
Unfortunately you probably can’t.
You could for example ask a target language reader to read through the free online translation. He/she should immediately identify anything that simply doesn’t make sense and any instances of unclear and unnatural wording.
However he/she may not pick up actual inaccuracies (mistranslations). Unfortunately these can be quite subtle and not readily apparent (like our play/gambling example above), so the translation may seem OK, but actually be incorrect in some parts.
And even if the objective is just for the reader to understand the gist of the source text, different readers will often interpret unclear or ambiguous text in different ways. Therefore any awkward or unnatural wording in the translation might lead that reader to a wrong interpretation of what the original text intended to say.
The only sure way to confirm any translation is of good quality is for someone who knows both languages to systematically and thoroughly compare the two texts.
I would suggest that if you’re at all concerned over whether a translation will meet your quality requirements, you really shouldn’t be using Google Translate to generate that translation.
Other articles that may be of interest:
– The 5 step translation process the professionals use, and what happens when you miss one out