Research topic: Deep Multimodal Models for Empowering Audio-Visual

Start Date: ASAP

Duration: Duration of the thesis (3 years)


The overall objective of this research program is to develop novel methods
and tools for digital storytelling. To his aim, an improved scientific
understanding of multimodal media content analysis, linking and consumption
will be developed. This PhD program addresses more specifically the
following topics:

• Combine the best available knowledge in machine processing,
machine learning and human editing of verbal description, to industrialise
the process of digital storytelling and re-use and re-purposing of existing
media as new resources by both the media producers and the media consumers.

• Develop state of the art techniques for analyzing audio-visual
content (including text), so that multimodal data can be extensively
described. The extracted descriptions will serve to structure and annotate
semantically large archives of audiovisual data and to better understand
their content and evolution.

• Study and implement temporal segmentation approaches that take
context and content into account in order to define in a precise and
localized way (temporally and possibly spatially) the semantic fragmentation
of audio-visual documents

• Investigate and evaluate automatic methods for detecting key
moments and identifying relevant hyperlinks in audio-visual contents in the
context of the project and in International benchmarking events.

This PhD position is funded by the MeMAD H2020 European project:
www.memad.eu. MeMAD stands for Methods for Managing Audiovisual Data and
aims to develop automatic language-based methods for managing, accessing and
publishing pre-existing and originally produced Digital Content in an
efficient and accurate manner within the Creative Industries, especially in
TV broadcasting and in on-demand media services. “Digital Content” contains
audiovisual material along with various ‘ancillary’ texts such as captions,
descriptions in different languages, and hyperlinks to related content,
similar to what hypertext is to plain text. More specifically MeMAD aims to
develop methods and models for producing enhanced digital audiovisual
information in multiple languages and for various use contexts and
audiences, and to industrialize these results with demonstrable proofs of
concept. These objectives will be implemented through a number of
work-packages and project-wide use cases which will also serve as additional
ways to measure our success in reaching the objectives and expected impacts.




Education Level / Degree: MSc (with distinction)

Field / specialty: Computer Science

Technologies: Machine Learning / Deep Learning / Computer
Vision / A.I.

Language requirements:

Languages / systems: English and French

Educational level:

Master Degree

Tagged as: , , ,

You can apply to this job and others using your online resume. Click the link below to submit your online resume and email your application to this employer.


EURECOM is a Graduate school and Research Centre in Communication Systems located in the Sophia Antipolis technology park (French Riviera), a major European place for telecommunications activities.

It was founded in 1991 in a consortium form [GIE] that allowed EURECOM to build a large network of renowned academic and industrial partners. The “Institut Mines Telecom” is a founding member of EURECOM consortium. EURECOM research teams are made up of international experts, recruited at the highest level, whose work is regularly honored and has earned international recognition.

EURECOM is particularly active in research in its areas of excellence while also training a large number of doctoral candidates. Its contractual research is recognized across Europe and contributes largely to its budget. Projects are at the heart of EURECOM’s research activity [around a 100 contracts managed each year], one of their many benefits is the wealth resulting from the collaborative work with partners.

EURECOM is part