Documents written in natural language in the records of patients contain information valuable for the management of patient data. However, theinformation needs to be unlocked from the natural language texts. In this project we want to focus on the extraction of entities and events from documents such as discharge letters and patient reports. More specifically, we will study the extraction of entities (e.g., symptoms, body parts, diseases, drug names), events (e.g., medical actions performed on the patient), and relations between entities or events such as spatial relations between entities (e.g., medical condition and body part), temporal (e.g., temporal order in which symptoms have appeared) and causal relations between events or entities (e.g., between symptoms and disease, between treatment and disease).
The scientific challenges are analyzing a language that is not well formed (both from a lexical and syntactic viewpoint), on which morpho-syntacticanalysis for a large part fails, and extracting the information with a limited amount of training data. The targeted methods regard semi-supervised learning and distant learning taking into account the most recent advances in models from distributional semantics. The language of the records will be mostly Dutch, but the records might contain statements in other languages such as English adding an additional level of complexity.
The ideal candidate has completed or is about to complete a master in computer science or a similar discipline and is acquainted with natural language processing and statistical machine learning. The candidate has a large interest in representation learning and information extraction. Outstanding results in prior studies are required.
The candidate is fluent in spoken and written English and fluently reads the Dutch language.