Overview
– NLP model development: Development of multilingual information
extraction models in the biomedical field, including mention extraction and
linking of terms to controlled terminologies. Pre-training of cross-lingual
large language models for healthcare.
– Technical project coordination: Coordinate technical contributions
from different partners in technological projects.
– Documentation and Reporting: Create technical reports and project
documentation in both English and Spanish
– Scientific writing: Collaborate in drafting technical research
proposals and writing scientific papers.
Company:
Barcelona Supercomputing Center
Qualifications:
Education
University degree in computer science, mathematics, statistics, Chemical engineering, materials engineering, data scientist, physics, bioinformatics, telecommunications, electrical engineering or equivalent.
Essential Knowledge and Professional Experience
Experience with Deep Learning and statistical data mining frameworks: Keras, Tensorflow, PySpark, PyTorch, Spacy, etc.
Experience with ML algorithms and techniques: LDA, Topic Modelling, LSTM, KNN, SVM, Decision Trees, Clustering, Word Embeddings, etc.
Experience in the development or management of software resources/tools, Github + Github projects.
Experience with NLP components and platforms.
Experience with named entity recognition and entity linking methodologies.
Additional Knowledge and Professional Experience
Strong programming skills in at least one of the following languages: Python, C++, Scala, R, Java.
Experience and skills related to bash, Docker, Kubernetes, Unity testing, Collab
Competences
Interest in biomaterial sciences, biomedicine and related application domains
Good communication and presentation skills.
Strong technical writing skills.
Ability to work both independently and within a team.
Used to work under pressure under strict deadlines
About Barcelona Supercomputing Center
The Barcelona Supercomputing Center is a scientific institution specialized in high-performance computing and big data.