The History of Machine Translation
The concept of machine translation has existed for centuries, but it was not until the early 1950s that it began to become a reality. Since then, machine translation has advanced significantly. However, it still cannot yet compete with the skill and finesse that a human mind can apply to translating a document.
Where did machine translation start?
In 1949, Warren Weaver of the Rockefeller Foundation put together a set of proposals on how to turn the idea of machine translation into reality. He blended information theory, code breaking lessons learned during the Second World War and the principles of natural language to pave the way for machines to translate one language to another.
One of the earliest machine translation successes was the Georgetown-IBM experiment. In 1954, IBM demonstrated at its New York office a machine that could translate Russian sentences into English. Though the machine could only translate 250 words (into 49 sentences), the world was delighted by the idea. Interest in machine translation around the world saw money being poured into this new field of computer science. The Georgetown experiment researchers, bursting with the confidence of their initial success, predicted that machine translation would be mastered within three to five years.
Challenges
Despite the early confidence, machine translation proved to be much harder than researchers at the time thought it should be. This is borne out by the fact that it still hasn’t truly been mastered, more than 60 years later.
Bilingual dictionaries, generative linguistics and transformational grammar were used to enhance the technology behind the Georgetown experiment. However, semantic ambiguity was quickly identified as an issue. If a word could mean more than one thing, how would the computer translating it know which meaning was intended in the original language, and thus which word to translate it into?
While early machine translations were of sufficient quality that the translation would provide a basic understanding of the original document, they were a long way from perfect. The race (largely between the US and the Soviet Union) to conquer machine translation was taking far longer than expected. In 1964, the US Automatic Language Processing Advisory Committee (ALPAC) dealt a blow to the US’s efforts by reporting that machine translation was essentially not worth the trouble nor expense. It recommended that resources focus instead on automated tools (such as dictionaries) to support human translators in their work.
Machine Translation outside of the US
Despite the US’s decreasing interest in machine translation (with the exception of one or two notable private enterprises), other countries continued their efforts. By the 1970s, Canada had developed the METEO System for translating weather reports from English to French. The system translated some 80,000 words per day and was of decent enough quality that it was used from 1977 to 2001 before being updated with a new system.
In other areas, globalization was pushing the need for machine translation like never before. France, Germany, the Soviet Union and the UK were all working hard to crack machine translation. If the art of translating using computers could be perfected, the cost and time savings for translating documents would be incredible. This knowledge spurred many governments and private companies to continue their efforts, but still, the perfect machine translation system eluded them.
Japan, in particular, was looking to lead the charge by the 1980s and early 1990s, and by the end of the 1990s, the growing availability (and power) or computers meant that the cost of machine translation efforts had come down considerably.
The 2000s saw some of the world’s largest technology companies focusing on machine translation with even more fervor. As well as Japanese efforts, Google and Microsoft in the US invested considerably in statistical machine translation. Those efforts later included blending statistical systems with syntactic and morphological knowledge in the quest for better results.
Machine translation and deep learning
Most recently, the big players (Google, Facebook and their ilk) have become fascinated by the use of neural networks and deep learning for perfecting machine translation. The neural network is loosely modelled on the way the human brain functions, with artificial neurons sending signals to other neurons when activated. Speech recognition and computer vision have both made significant leaps forward as a result of neural networks. Machine translation has also benefitted.
Google reported in 2016 that it had made a significant step forward with machine translation. Google Translate had already been operating for a decade, but the switch to a neural network marked a step change from often clumsy translations to far more impressive results. This was thanks to the Google Neural Machine Translation (NMT) system.
In brief, Google’s NMT translates whole sentences instead of individual words or small groups of words. It works by using an encoder to break down sentences. The system then represents the meaning of the constituent parts of those sentences as a vector. As The Register so succinctly explains:
“The system interprets the whole sentence, and the decoder begins to translate each word by looking at the weighted distribution over the encoded vectors and matching them up to the most relevant words in the target language.”
Interestingly, Googles NMT system then took the learning one step further, by beginning to translate between language pairings that it hadn’t been taught. Programmers had taught the system to translate between English and Portuguese, and English and Spanish. The system itself then became capable of producing reasonable quality translations between Portuguese and Spanish, even though its programmers hadn’t input that language pairing.
Branded ‘zero-shot translation,’ Googler’s researchers believed that their NMT had provided the first example of true transfer learning in machine translation. This was a significant leap forward.
In October of 2020, Facebook announced another milestone- “first multilingual machine translation (MMT) model that can translate between any pair of 100 languages without relying on English data.” This approach promises further gains in MT quality because there is less opportunity for meaning to get lost in translation.
Machine translation today
The evolution of machine translation has undoubtedly given many human translators the jitters. This has been the case ever since the widely reported success of the Georgetown experiment in 1954. At that time, many translators worried that they would be out of a job in a few years’ time. Many translators today feel the same way.
Despite these concerns, machine translation is not yet sophisticated enough to perform better than human translators. This was put to the test in February 2017, during an epic competition organized by Sejong Cyber University and the International Interpretation and Translation Association of Korea. Four humans and three machine translations (Google Translate, Systran’s translation program, and Naver’s Papago app) took part.
Three (human) translators judged the results based on accuracy, language expression, logic and organization. While the machines translated the four test documents faster, the humans won hands down, with a high score of 49 out of 60 possible points. The highest machine score, achieved by Google Translate, was 28. The subtlety of expression and emotion were highlighted particularly as being beyond the grasp of the machines.
Machine translation pros and con
Pros
There are certainly advantages to machine translation. A machine can translate in minutes something that would take a human an hour or more. As well as saving time, the cost saving of this can be significant.
Machine translation certainly does have its place. For companies with long, repetitive documents that are for internal consumption only, machine translation will suffice on many occasions. The same also applies to those looking to gain a basic understanding of documents written in another language. Machine translation can be used to translate such documents to a standard that will suffice for the casual reader.
Cons
The keyword above is ‘suffice.’ While many companies who are new to translation start off by using a computer to meet their needs, they quickly find that the quality of the translation only suffices for a basic understanding of the content. The translation generated by the machine is not of the quality required for a professional business document.
This is where humans retain the lead they have over machines – humans produce higher quality translations than machines do. The Sejong Cyber University confirmed this in 2017. It’s still true today in the real world, despite headlines that claim “human parity” has been reached or even exceeded.
For example, earlier this year, researchers from the University of Amsterdam published a paper called “The Unreasonable Volatility of Neural Machine Translation Models.” As it turns out, even slight changes in a source sentence (such as changing a number or the gender of the subject) can result in a surprisingly different output. According to Slator, “the systems clearly do not demonstrate a good understanding of the underlying sentence parts — if they did, they would not generate the inconsistencies observed.”
Some experts also argue that the current standards for evaluating machine translation output and comparing it to that of human translators need revision. For example, evaluations are usually conducted on a sentence-by-sentence basis, which means that the human evaluators don’t have the full context available when they grade the translations. This means that some errors, omissions and inconsistencies in the MT output are not being properly accounted for when researchers tout studies that “prove” machine translation is on par with human translation.
Errors are especially likely in less commonly translated language pairings and for languages that differ greatly from English, such as Arabic, Chinese and Japanese.
There’s also the fact that machine translation’s mistakes can cause reputational and political damage. Some of Google Translation’s highest profile mistakes so far have included referring to the Russian Federation as ‘Mordor’ in Ukrainian and mis-translating ‘grelo’ as ‘clitoris’ instead of ‘vegetable’ when advertising the Galician town of As Pontes’ ‘clitoris-tasting festival!’
The future of machine translation
Given recent leaps forward in MT technology, it would be easy to predict that machines will be able to translate as competently as humans in just a few years. However, that same projection was made back in the 1950s and still hasn’t come true. History has given us reason to doubt the capability of machines when it comes to translation, despite the incredible power of modern computing systems. For the time being, it can be a helpful tool but it needs to be combined with careful post-editing from a human linguist as part of a robust quality-control process, as we do in our MT solution.
Machine translation has come a long way since the ‘50s, but it still has a very long way to go before it can match the linguistic subtlety that the human brain can deliver. Until that point in time arrives, our exceptional team of expert human linguists are standing by to deliver translation of your documentsat a level of quality that no machine can match.