More and more information retrieval (IR) environments involve contextual data, in the form of social information such as user profiles, or link data on users’ connections, or in the form of information pertaining to the tasks users are involved in. In the context of support centers, retrieving solutions in response to the problems expressed by user queries can be formulated as a contextual IR problem in which user queries are enriched with data collected from different hardware components that are monitored over time.
In many cases, the contextual information one can rely on is however noisy, and there is no obvious way to filter out non-relevant contextual information from relevant one. In social media, for example, user profile information is not important when general queries are considered, whereas it is important for queries specific to an user. Similarly, for support centers, the majority of the collected system monitoring data is not pertinent to retrieve documents and solutions relevant to a given user query. In all these cases, users’ information needs are made up of two parts: a “standard” query and (noisy) contextual information.
The goal of this PhD is to develop models and methods that can (a) identify contextual elements pertinent to users’ information needs so as to (b) efficiently retrieve the relevant information, and (c) predict future users’ needs from users’ context, all these operations being done at very large-scale. Among the different approaches one can think of to address these problems, the ones based on latent probabilistic models are particularly interesting as they allow one to capture implicit dependencies between different element types (as textual and numerical contextual elements). In addition, as contextual information is usually temporal, one needs to develop temporal versions of these models. Lastly, it is crucial to develop solutions that can deal with huge amounts of data. The models and methods developed will be evaluated in two contexts: the first one corresponds to social information retrieval based on users profiles and network data; the second one corresponds to search in support centers in which standard queries are associated with information monitored from users’ computing environments.
The successful PhD student will work on all the above aspects. He/she will work on the deployment of such models in Big Data architectures, mainly based on Apache Spark environments, in collaboration with the ERODS team of LIG (erods.liglab.fr).
University of Joseph Fourier
About University of Joseph Fourier, Grenoble