Overview

We are a dynamic and innovative small-size data company specializing in language data products and services. We are a team of 18, distributed across two offices in Amsterdam and Thessaloniki.

Position Overview:

We seek a skilled and motivated Data Engineer to join our team to design, develop, and maintain robust and scalable NLP and data-intensive applications. This is a fantastic opportunity for an individual who is passionate about natural language processing and the booming field of applied AI, thrives in a collaborative environment, and is eager to actively contribute to the success of our efforts.

Responsibilities:

Develop and implement multilingual multimodal (text, speech, video) data collection pipelines. This includes overseeing large-scale data acquisition through web crawling, scraping, and other data-gathering techniques.
Contribute to the data optimization of machine learning algorithms to address intricate NLP challenges and elevate product capabilities.
Take charge of the preprocessing and cleaning activities and ensure the quality and reliability of multimodal data.
Extract and utilize meaningful features from both structured and unstructured data to increase model precision and robustness.
Collaborate with cross-functional teams to seamlessly integrate machine learning and data solutions into TAUS products.
Collaborate with business partners to understand data requirements and collection needs, particularly for EU projects.
Ensure data compliance with relevant regulations and best practices for data privacy and security.

Company:

TAUS

Qualifications:

University degree in Computer Science, Data Science, AI, Computational Linguistics, Machine Learning, or a related field.
2-3 years of relevant work experience with a strong emphasis on data collection and pipeline management.
Profound knowledge and hands-on experience in building and managing large-scale data collection systems, including web crawlers and scrapers.
High proficiency in Python and well-acquainted with ML libraries like PyTorch, NumPy, Pandas, and Scikit-learn.
Solid understanding of Transformer architectures and current trends in deep learning.
Experience with data modeling tools, cloud platforms (AWS, Azure, GCP), and familiarity with data governance, quality, and security best practices.
Excellent problem-solving and communication skills. Our working language is English.
The candidate should be based in the Netherlands or Greece.

Educational level:

Diploma

Level of experience (years):

Mid Career (2+ years of experience)

Tagged as: , , , , , ,

About TAUS

The Gateway to the World of Language Data