Overview
Innovate & Deliver: Design, build, test, and deploy end-to-end AI search solutions using neural information retrieval techniques, semantic and hybrid search, and re-ranking approaches. Develop models for information retrieval, semantic search, document re-ranking, and query understanding, including dense retrieval architectures, semantic chunking models, embedding models, cross-encoders, SLM re-rankers, and transformer-based LLM-driven approaches. Work in collaboration with engineering to ensure well-managed software delivery and reliability at scale.
Evaluate & Optimize: Develop comprehensive data and evaluation strategies for both component-level and end-to-end quality, leveraging expert human annotation and synthetic data generation. Apply robust training and evaluation methodologies to optimize retrieval quality and latency.
Drive Technical Decisions: Independently determine appropriate retrieval architectures, indexing strategies, ranking models, data, and evaluation strategies for IR and NLP problems. Solve search relevance, ranking, and scalability challenges in a self-directed manner while contributing effectively as part of a multidisciplinary team.
Align & Communicate: Partner closely with Engineering and Product to translate complex challenges into scalable, production-ready solutions. Engage stakeholders to deeply understand business problems and domains, shaping objectives and goals that align AI search capabilities with product needs and business objectives.
Advance the Field: Publish at top venues (e.g., SIGIR, ECIR, NeurIPS, ACL, EMNLP, ICLR) and contribute to patents to keep our solutions cutting-edge and competitive.
Company:
Thomson Reuters
Qualifications:
PhD in Computer Science, AI, or a related field, or a Master’s with equivalent research/industry experience.
3+ years of hands-on experience building and deploying modern search or RAG systems with neural retrieval methods and deep learning models for NLP.
Strong background in information retrieval fundamentals, including indexing, query processing, ranking and relevance modelling.
Strong programming skills (e.g., Python) and experience with modern deep learning frameworks (e.g., PyTorch, DeepSpeed, Torchtune, LlamaFactory).
Proven ability to translate complex problems into innovative AI applications.
Publications at relevant venues such as SIGIR, ECIR, NeurIPS, ACL, EMNLP, ICLR.
Technical Qualifications
Deep understanding of neural information retrieval fundamentals: BM25, hybrid search, dense retrieval (e.g., DPR, ColBERT), cross-encoders, bi-encoders, late interaction models
Hands-on experience designing and implementing search or RAG systems: vector databases, retrieval strategies, document chunking, metadata filtering, hybrid search, re-ranking, context optimization, and orchestration
Experience developing relevant datasets and evaluation frameworks
Solid understanding of ML and deep learning approaches for NLP
Solid understanding and experience with post-training of large language models and their application to retrieval systems
Preferred Qualifications
Extensive prior work on search, question answering or RAG over large corpora and long documents, including experience with legal or enterprise search systems
Experience with multi-stage or agentic retrieval architectures and query understanding for complex information needs.
Experience building applications for the legal domain (e.g., legal search, case law retrieval, precedent finding, document review, document drafting).
Publications at relevant venues such as SIGIR, ECIR, NeurIPS, ACL, EMNLP, ICLR.
Educational level:
Ph. D.
Level of experience (years):
Mid Career (2+ years of experience)
About Thomson Reuters
Thomson Reuters delivers critical information from the financial, legal, accounting, intellectual property, science, and media markets.