Multilingual Big Data: Why And How

blog-what-is-big-dataBig Data and cross-language processing can be a very happy marriage that will open way to the new generation of powerful multilingual product. However, what is missing to let it happen already? What is the profile of the individual who will bring the multlingual Big Data to life? Read about it below.

Big Data is a term used to describe increasing amount of structured and unstructured information and methods of its processing. Currently, Big Data attracts lots of attention in industrial and public sectors: the Big Data industry is estimated as a $24 billion dollar market by 2016. For the past few years, Bid Data has been a buzzword in the global digital universe, however as any new technology it still raises many issues.

Since the time of the dot-com Boom in the 90s, many industries have been struggling to identify which development directions of the Big Data are strategically important for their businesses and which ideas deserve to be included into the long-term product roadmap.

The Big Data community faces a number of challenges, including:

  • Interoperability: a lack of standardisation in data collection for a great variety of possible data sources (Internet, sensors data, social media, mobile platforms, etc.);
  • A shortage of efficient data storage platforms;
  • Data processing challenges (scalable relational databases, choice of development framework);
  • A lack of best practices in extraction of structured information from noisy data;
  • A vast amount of available unroofed machine learning and data analytics techniques and lack of best practices in the context of huge datatest;
  • A lack of efficient use of parallel computing and data security for Big Data applications;
  • An absence of a widely accepted flexible pricing mode;
  • A lack of data visualization commercial solutions and open source tools compatible with Big Data tools;
  • A lack of skilled data science professionals. The Big Data technology dictates specific requirements for the skillset of data analytics, software engineers and managers that are different from the classical data analytics skills.

Another topic that players on the Big Data market find undisputedly promising and that still requires a lot of attention is bringing multilingualism to Big Data. Approximately 80% of Internet users are non-native English speakers and there is a clear need for a multilingual Big Data solution. Being crucial for both corporates and end-users, multilingual data processing will help them quickly spot critical information in any language and ensure that data in “foreign” languages is not ignored.

To put this into real-world context, incorporation of real-time translation and linguistic capabilities into a Big Data application will open new horizons for a data-driven business. This will also make possible a global linkage of dynamic and static information in different languages leading to language-independent corporate search and bringing more global insights to the company from the analytical perspective.

One of the missing niches in this industry is creating a technical framework and a business model to integrate machine translation and other cross-language data processing techniques in Big Data. Still cutting-edge, but already accepted in the translation industry, machine translation will provide an adjustable speed/quality balance for corporate and consumer-oriented Big Data solutions.

This task is challenging in both technical and management areas. The one willing to address this gap, will have to deal with numerous product design and implementation challenges. He or she needs to have a macro perspective and advanced understanding of the global economy and know (1) how to capture insights to make better decisions regarding market expectations as a tangible and transparent multilingual Big Data services, (2) command a toolkit to figure out what is a fair price to this service and (3) be able to develop the understanding of goals, politics and concerns with each group of stakeholders involved into the global-scale Big Data industry.

Do you think you have a right combination of business skills for successful management of the big-scale innovation project and technical knowledge?

Then this opportunity can be for you.


About the author

Maxim Khalilov, PhD is the Machine Translation CTO at Matrix Communications AG (Germany) and the co-founder of He is a former R&D manager at TAUS, post-doctoral researcher at the University of Amsterdam, and a PhD student at the Polytechnic University of Catalonia (Spain).


Leave a Reply