GEC is the task of automatically correcting grammatical errors in text. MT-based approaches to GEC – where MT systems carry out a monolingual translation from a text containing errors into a text that is free of errors – have produced very competitive results over the recent years.
This internship project aims to explore innovative ways to combine different MT systems for the purpose of creating a high quality semi-automatic error correction system. The long-term plan is to integrate the system in a manual annotation tool to assist linguists in annotating learner corpora, by offering corrections for spans of text manually marked as containing errors.
The internship project will take a two-step approach where (i) several MT systems (including Automatic Post-Editing models) are trained on specific learner corpora, and (ii) the best output is then selected by a Machine Learning based ensemble component trained to detect and filter unreliable error corrections.
This internship will focus on a small and carefully selected subproject. The number and complexity of experiments will be compatible with a work plan spanning over 3-4 months, ideally aiming at a scientific publication.
The internship will be jointly led by the Institute for Applied Linguistics (IAL) at Eurach Research, Bolzano, and the “Machine Translation” Research Unit at Fondazione Bruno Kessler (FBK), Trento. The selected intern will be based at Eurac Research in Bolzano, and will spend a period of time at FBK.
EURAC and FBK
– have a good background in deep learning and maths
– have strong programming skills
– have excellent English communication skills, both oral and written
– knowledge of machine translation and linguistics is a plus
Candidates should indicate if they are eligible to an Erasmus+ internship funding.