Release of Autodesk Post-Editing Data Corpus

April 28, 2015

Release of Autodesk Post-Editing Data Corpus

It is my pleasure to announce the release of the Autodesk Post-Editing Data corpus with the ISLRN 290-859-676-529-5 (http://www.islrn.org/resources/identify_islrn/).

This resource contains parallel English source–MT/TM target segments post-edited into several languages (Simplified and Traditional Chinese, Czech, French, German, Hungarian, Italian, Japanese, Korean, Polish, Brazilian Portuguese, Russian, Spanish) with between 30000 and 410000 segments per language. Its main intended use is for research in automatic quality estimation of Machine Translation output. The provided data are predominantly software user manual content with some segments coming from marketing and education materials. They cover the portfolio of Autodesk products from various domains, notably architecture, engineering, civil engineering, simulation, computer graphics, media and entertainment. The content was translated in the period 2012.11.12 to 2014.09.23.

The corpus is available from https://autodesk.box.com/Autodesk-PostEditing and more information is available in the included Readme file. The data are released under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (http://creativecommons.org/licenses/by-nc-nd/4.0/).

Regards,

Dr. Ventsislav Zhechev
Computational Linguist, Certified ScrumMaster®
Platform Architecture and Technologies
Localisation Services

MAIN +41 32 723 91 22
FAX +41 32 723 93 99

http://VentsislavZhechev.eu

Source: Moses Support

NLP News

Tags: Computational Linguistics, Machine Translation

NLP People

Natural Language Processing and AI Careers

Release of Autodesk Post-Editing Data Corpus

Got a question?