Big Data Analytics and Machine Translation for a Life Saving Solution
While being passionate about all the ways the language finds its way in technology – as the new user interface input via speech recognition, or with natural language understanding, I have special feelings about machine translation being my first step to linguistic technologies. Recently I have read about an excellent and innovative case for machine translation – the use of translation automation when managing natural disaster consequences. Mission 4636 was a crowdsourcing project of providing immediate translation services during 2010 Haiti earthquake to allow English and French speaking rescue services to communicate with Creole-speaking Haitians in order to identify new victims, geolocalize emergency situations and coordinate rescue operations.
The leaders of Mission 4636 project have now developed a draft “A Cookbook for MT in Crisis Situations”[i] explaining how machine translation integrated into social media and SMS services “can dramatically increase the speed by which relief can be provided”. The new machine translation service should be domain-oriented and targeted especially at disaster recovery with the corresponding subdomains being earthquake, tsunami, nuclear disaster, flooding, etc.
Leaving apart a great humanist and social value of the project – which are extremely great and important – it is also amazing how this use case allow different technologies work together in a perfectly efficient solution.
The Cookbook solution suggests having domain-specific translation memories in different languages. This means it suggests using structured data for further exploitation. The big data of linguistic corpora in different languages (including most rare and mostly oral ones) should be annotated using semantic approach – combining words according to their meaning into families belonging to a domain and subdomain. The next step is to apply statistical machine translation on the identified messages and social media posts. The identification is using natural language understanding techniques capturing the meaning of phrases and making it understandable for the machine translation engine.
Similar use cases of semantic and data mining techniques are presented by Google Flu trends – identifying the “outbreaks of influenza”[ii] . One can see that big data techniques – data structure, visualization and mining are a crucial component here. Then come natural language understanding and machine translation.
These language techniques powered by artificial intelligence mechanism – recurrent neural networks – technology “responsible for some of the significant improvements in language understanding, including the machine” as explained in Gigiaom article[iii]. This technology allows to understand the meaning of the source sentence and render it into the target language using the statistical probability combined with the “distribution of the likeliest translation” of the surrounding words.
When discussing the technological trends of 2015 AnRCloud has mentioned big data and natural language processing. The examples above show that the technologies themselves are not new and have already started to show their importance. The most interesting and crucial tendency here is all these tools being used together in a mutually supported collaboration. According to AnRCloud, this symbiosis is the most important trend to follow.
[i] Crisis MT: Developing A Cookbook for MT in Crisis Situations by William D. Lewis, Robert Munro, Stephan Vogel, available online at http://research.microsoft.com/pubs/152760/wmt-10.pdf, accessed on February 5, 2015
[ii] Social Media in Disaster Relief by Peter M. Landwehr and Kathleen M. Carley in Data Mining and Knowledge Discovery for Big Data, edited by Chu, Wesley W., Springer, 2014, p. 250
[iii] How AI can help build a universal real-time translator by Stacey Higginbotham for Gigaom, January 2019, 2015, available online as https://gigaom.com/2015/01/29/how-ai-can-help-build-a-universal-real-time-translator/, accessed on February 4, 2015