Our team at Elsevier builds and integrates matching and disambiguation systems for research publications data. These systems automatically construct the authoritative publication and citation graph that is a major building block of widely-used products including Elsevier’s Scopus. We are looking for a Principal Data Scientist with 5-10 years of experience, especially in NLP and Text Analytics, to lead our data science effort.
Advance the development of our systems for disambiguation, tagging, matching, linking, clustering, and deduplication. Our goal is industry-leading accuracy of our publications data, our citation graph, and the systems that build these.
Work with product managers and other stakeholders to clarify new use cases and data sources, and translate these into data science requirements for the systems we build.
Provide leadership to ensure high-accuracy results as we expand the set of products and content sources within the scope of our matching and disambiguation systems.
Work closely with software development teams to ensure systems are aligned with data science requirements, and algorithms will scale and meet accuracy requirements.
Lead our data science team toward new and improved models, and opportunities for improved workflows and automation. Prioritize and focus our data science efforts.
Work with our content evaluation teams to build large-scale gold sets that are the basis for measuring and improving the recall and precision of our systems.
Advance the reliability and flexibility of our workflows, to increase our ability to rapidly incorporate new content while continuing to deliver high accuracy.
Work cooperatively with team members around the world.
Skills and Behavioral Attributes:
An M.S. or Ph.D. in Computer Science or Data Science, or closely related field with sound technical expertise relevant to the areas mentioned above.
5-10 years of experience working on industrial data science projects that have reached the commercial production stage.
Knowledge and experience with NLP/Text Analytics.
Knowledge and experience with AWS technologies and building data science processes around them (MLFlow, Spark, etc)
Prefer candidates able to develop production-ready code and understand related processes, such as automated testing.
Prefer candidates with experience in scholarly research publications data.
Level of experience (years):
Senior (5+ years of experience)