Leveraging university knowledge to the real world workflow.
This time, we would like to present you a post written by one of our regular visitor, Irina Popova. She is searching for a new career challenge after her recent graduation from the University of Stuttgart and would like to share one of her first real-world work experiences.
In this post, I am going to describe my internship at the Wordflow Translation&Sofware Localization, during which I had an opportunity to apply my statistical machine translation skills in the translation agency workflow.
I studied Computational Linguistics at the University of Stuttgart (you can find a link to description of my program here) and completed a Master thesis in statistical machine translation, which is an area of Natural Language Processing that always attracted me.
When I was working on my thesis, at a certain point, I realized that to start a decent career I need to make an internship. This would help me gain some experience and understand in which direction I would like to go further.
At that moment, there were no open trainee positions for pure machine translation interns neither in academia, nor at big translation buyers. However, I applied for a trainee position in a translation agency, SAP Partner, Wordflow Translation & Software Localization, and got accepted. Once all the formalities regarding contract details were done and more, that is more important, the process of making the internship a part of studies (internship is not obligatory to get a Master degree at IMS) was finished, I started to work in Wordflow Translation&Software Localization.
For a start, I learned mostly about CAT-tools (Trados Studio 2011, WinAlign and Transit). The target was to get the idea of the core translation process: what kind of tools a human translator usually uses and which pre- and post-processing steps a text has to go through before the customer receives the translation. Namely, these steps are alignment, actual translation, terminology treatment, proofreading and updating the translation memory, if the translation is accepted. It was interesting to have a look at the way a translation memory works: basically, the matching process is very similar to the one humans take using extended dictionaries. If a phrase is already has been translated, a translation suggestion would pop up. If not, a translator has to translate the current segment from scratch.
The steps before and after actual translation are usually not considered is academia, where the primary goal is to learn fundamental concepts. Once a system is working, a user wants to train it and receive better output or/and higher BLEU scores. But, in many cases, adopters do not pay much attention on how the system is introduced to the real world: which criteria should be fulfilled, what should be done to complete the translation process and receive output. The crucial idea of output adaptation is to make the text understandable not only for the user of the translation assisted tool, but also for other people, who are interested mainly in the text contents and not in the rules of the text creation.
The difference in the final goal is the primary and probably the most difficult challenge, which one has to understand and accept changing the university environment to industry. It involves some adaptation.
In practice, the adaptation steps include understanding of industrial system requirements:
In academia, I was working with Moses, which usually expects only plain text as input. Moses users typically build a system with the desired parameters, train it and receive output in the same format. Once you get your first translation, you can start to modify your system, run preprocessing steps (parsing, POS-tagging, stemming, lemmatization or run any NLP-tool serving to your purposes). A point to highlight: you are always working with the same file format. The input file does not contain many tags, which are expected to be carried out to the output. In the real translation process, keeping tags is one of the most crucial points: in the majority of cases, your customers want to receive translated documents keeping the same tag information as in input. If your machine translation engine accepts only plain texts you take a risk to spend more time on post editing the output, inserting the missing tags and correcting the layout.
Back to my story, the company was going to integrate machine translation technologies into the workflow.
One of the components of the translation system I was working with was a user-friendly tool, developed for human translators to increase translation productivity. Once users figure out how it works and how to start the actual training, they do not need to worry about automation anymore and can concentrate on the other aspects such as the efficient use of language data, post editing of machine translation output (customized post-editing guidelines and system customization).
Since the core of the system was hidden, it worked mostly as black box: what I did was just submitting an input file to the system and receiving an output – it allowed me to focus on other steps in the translation pipeline.
During my internship at Wordflow one of my tasks was to investigate the opportunities of Moses integration into the company’s translation workflow. This task required full understanding of each step in the translation environment: from the moment I opened a new file to the translation memory update.
However, in industrial environment, this process looked different: I was familiar with the core system, I knew how to modify it, but now I had to take into account all the consequences of any modification. The main question was how my modification would influence the translation process, translation time, what would be cost/savings and my effort.
Last but not least: in academia if you use any NLP tool – you usually do not care about the license or terms of use: you simply download it and run on your input (majority of NLP tools are free for educational use). Once you switch to industry, you have to be careful with licenses. It is advisable to check terms of use of any tool before starting using it.
Now, when the internship is over I am searching for a place where I can apply my knowledge to the tasks from the area of linguistics, statistical machine translation and natural language processing. To be more precise, I would like to conduct research, run linguistic experiments and implement latest technologies to improve performance and accuracy of machine translation output quality.
Please contact me at irina.popova.28.04@gmail.com