Overview

The thesis takes place within the Descartes project
(https://www.cnrsatcreate.cnrs.fr/descartes/),
a large France-Singapore collaboration project on applying AI to urban
systems.
The project will generate a lot of data about artifical systems deployed
in the wild, part of which will be expressed as textual data (expert
reports, user reactions,
news coverage, social media conversations). Natural language processing
(NLP) models can help access that voluminous information, but there is
an important need
from operators, policy makers and public institutions to understand the
reasons behind models’ behaviours and the information they extract, to
be able to evaluate
their potential issues (accuracy, fairness, biases).
This thesis will investigate methods design to explain machine learning
systems typically used in NLP while integrating an interactive process
with the system users.

**Thesis subject**:
Modern machine-learning based AI systems, while achieving good results
on a lot of tasks, still appear as “black-box” models, where it is
difficult to
trace the path from the input (a text, an image, a set of sensor
measures) to the decision (classification of a document, an image, a
situation).
The issue of explainability poses two different problems: (1) what is a
good explanation, and specifically what is a good explanation in the
context of textual models?
and (2) how to scale existing explanation methods to the kind of models
used in NLP tasks?
About (1), existing methods for image classification or tabular data
tend to rely on the extraction of a set of pixels or features that are
sufficient for generating predictions, or increase the probabilities of
the prediction. It is less straightforward for textual input, which
consists of words, but whose meanings are inter-related in a given
context (for instance “good” in a review could be an indication that the
review is positive … unless it is preceded by “not”). So the first
problem of this thesis will be to provide humanly acceptable
explanations of simple text classifiers such as those foreseen for the
detection tasks in
the dedicated sub-project of Descartes.
About (2), modern NLP models are based on very large and complex
architectures, such as the transformer family. Logically sufficient or
causally satisfying explanations are difficult to get for such cases, as
both such methods suffer from scalability problems. So we will explore
heuristics based on our solution to the first problem guiding an
interactive procedure between explainee (the person requesting the
explanation) and the ML system whose predictions should be explained.
We will evaluate the procedure on those users targeted for the use cases
of the project. Brian Lim from NUS Singapore will help design the
validating experiments.

Company:

University of Toulouse

Qualifications:

A background in Computer Science and/or Machine learning.
Familiarity or a willingness to acquire a familiarity with both model
based and model agnostic explanation paradigms that use either logical
or statistical methods.
A familiarity with NLP / dialogue would be a plus.
Given the nature of the project, the student should be open to work in a
cross-disciplinary environment, and have good English communication skills

Educational level:

Master Degree

Tagged as: , , , , , ,