The PARSEME-FR (http://parsemefr.lif.univ-mrs.fr/doku.php) project offers a 1.5-year post-doc position in Natural Language Processing, starting in April 2018. Candidates should send their application before February 1st, 2018 (see contact information below).

* Location: to be discussed with the members of the PARSEME-FR consortium (Nancy, Orléans or Paris)
* Employer: University of Orléans
* Contract : fixed term position
* Remuneration: approx. 2,300? per month net income (in addition to the salary, the contract includes health benefits)

## Topic:

**French MultiWord Expressions representation and parsing**

Many NLP applications require a fine-grained representation of the syntactic (and sometimes semantic) structure of texts. The process of building such a representation is called deep parsing. Recent work combining symbolic and data-driven techniques have led to significant advances in this field, notably in terms of robustness and efficiency. Still, Multiword expressions (MWE), that is, groups of (not always continuous) words that exhibit some idiosyncratic properties, such as “hot dog”, “hard disk”, “kick the bucket”, “pay attention”, etc. are still a major bottleneck for deep parsing (Sag et al. 2001, Baldwin and Kim 2010). This is due, among other things, to their unpredictable behavior at several levels (irregular morpho-syntax, non-compositional semantics, …) and to the lack of annotated training data.

One of the goals of the PARSEME-FR project is to enhance the support of MWEs in French parsing. To do so, 4 work packages have been defined, dealing respectively with (i) MWE annotation in texts or treebanks, (ii) MWE lexicons, (iii) MWE statistical and (iv) symbolic parsing. The recruted post-doc will work in the last WP. Two complementary aspects will be considered:
– the representation of MWEs in linguistic resources (including electronic grammars, see e.g. (Abeillé, 2002)),
– the use of these MWE-aware resources in deep (symbolic and hybrid) parsing (see e.g. (Foth and Menzel, 2006)).

Among existing resources for French, one may cite the FRMG (FRench MetaGrammar) resource which corresponds to a linguistically motivated abstract and modular description of the syntax of French (De La Clergerie, 2010). FRMG has been successfully used to compute deep representations of French texts. The first phase of the postdoc project will consist in extending the expressive power of metagrammars to provide compact representations of MWEs. A second step will consist in extending FRMG with information about MWEs automatically extracted from treebanks (e.g. syntactic or lexical constraints, distribution information, etc.) and from external resources (e.g. lexicon and grammars).

This extension of the linguistic description fed to the parser may rise some efficiency issues. Indeed, the larger the size of the input grammar, the larger the size of the parsing search space (due to syntactic and/or lexical ambiguities). To control the exploration of this search space, several techniques have been proposed including A* algorithms for MWEs (Waszczul et al., 2017). The second phase of the postdoctoral project will focus on the extension of existing algorithms dedicated to MWE parsing and their application to the DyALog engine used to run FRMG (De La Clergerie, 2013).


University of Orléans


* PhD in computer science or computational linguistics
* Interest in linguistics and familiarity with language technology
* Capacity to work independently and as part of a team

Language requirements:

* Good knowledge of French and English (not necessarily native)

Specific requirements:

* Duration: 18 months, starting in April 2018 (open until filled)

Educational level:

Ph. D.

Tagged as: , , ,