Cambia’s Chief AI Office delivers innovative data and technology products, services, and solutions that will help drive Cambia to its 2020 vision of person-focused health care transformation.
Looking for a passionate, talented and inventive NLP Data Scientist to help build industry-leading speech and language solutions. Together with a highly multi-disciplinary team of scientists, engineers, strategic partners and subject domain experts, you will work on building a real product with natural language processing and machine learning at its core.
Essential Function Of The NLP Data Scientist
Utilize statistical natural language processing to mine unstructured data and create insights.
Build and optimize cutting-edge natural language understanding systems such as conversational agents (chatbots).,
Build core in-house NLP components and analytical tools such as document clustering, topic analysis, text classification, named entity recognition, sentiment analysis, and part-of-speech tagging methods for unstructured and semi-structured data.
Identify and deploy existing machine learning, natural language processing, and information retrieval techniques and systems for knowledge management and discovery, such as using Electronic Medical Records (EMR) data, progress notes, and discharge summaries to identify admitting diagnosis, reason for consultation, clinical history, etc.
Identify ways to analyze consumers’ experiences from various communication channels and improve customer satisfaction.
Cluster and analyze large amounts of user generated content and process data in large-scale environments in Amazon AWS such as EC2, EMR, MapReduce, and PySpark.
Integrate the NLP pipeline into the production environment, ensure its scalability, and leverage knowledge gained into other projects, modeling, and work practices.
Design novel algorithms for problem solving, which may include data cleaning, feature selection, statistical modeling, data clustering and classification, text processing, and other machine learning techniques, to solve complex healthcare problems presented by healthcare organizations.
Collaborate with different functional teams within Cambia and externally to find solutions to problems in healthcare.


BS/BA degree (or equivalent experience) in a strongly quantitative field such as: Big Data, Computer Science, Data Analytics, Engineering, Applied Mathematics, Statistics, Physics, Operations Research, Econometrics or related degree program, ideally with a specialization in NLP. Master’s or PhD degree preferred.
Strong analytic and problem-solving skills, including the ability to apply quantitative analysis techniques to business situations including forecasting, descriptive statistics, statistical inference, and multivariate modeling techniques.
Experience with a good range of NLP techniques, including text processing, tokenization, POS-tagging, parsing, annotation, regular expressions, language modeling, etc.
Ability to develop prototypes by manipulating and analyzing complex, high-volume, high-dimensionality data from various sources.
Expertise in producing, processing, evaluating, and utilizing unstructured/semi-structured data.
Proficiency in open-source NLP and machine learning toolkits such Stanford CoreNLP, NLTK, Gensim, Mallet, OpenNLP, LingPipe, cTAKES, scikit-learn, NumPy, LIBSVM, MLlib, Theano, TensorFlow, etc.
Solid background in statistical learning and clustering techniques for NLP such as HMM, CRF, SVM, MaxEnt, LDA, LSI, and K-Means.
Must have ML/NLP algorithm implementation experience as well as the ability to modify standard algorithms, e.g., change objective functions, work out the math, and implement.
Practical ability to visualize data, communicate about data, and utilize data effectively.
Proficiency in SQL relational databases and/or NoSQL databases.
Ability to think creatively and to work well both as part of a team and as an individual contributor.
Eager to learn new algorithms, new application areas, and new tools.
Excellent oral and written communication skills to effectively interface and communicate with a broad array of internal and external contacts including leadership.
Strong programming skills in at least one object oriented programming language, e.g., Java, Python, C++, Scala, etc.
Fluency with Linux/Unix.
The Following Skills/experiences/knowledge, a Plus
Expertise in one or more of the following areas: question answering, conversational agents (chatbots), entity/relation extraction, summarization, semantic search, information retrieval, and knowledge bases.
Experience and/or motivation to work on modern deep learning approaches to NLP, such as word/paragraph embedding and representation learning.
Basic knowledge of core linguistic concepts, such as phonology, morphology, syntax, and semantics.
Experience with noisy and/or unstructured textual data, such as tweets and search queries.
Knowledge of or experience in building production quality and large-scale deployment of applications related to natural language processing and machine learning.
Experience with large-scale data analysis tools in a cloud environment, such as Spark, Hadoop, MapReduce, Hive, Pig, etc.
Experience with open-source search engines like ElasticSearch, Solr or Lucene.
Demonstrated knowledge of health plan operations, medical terminologies/ontologies and/or clinical informatics and healthcare systems.
Experience with text analysis in clinical and medical domain corpora like Electronic Medical Records (EMR).
Knowledge of REST APIs and visualization tools, such as HTML, CSS, JS, and D3.js.
General software development skills (source code management, debugging, testing, deployment, etc.).
Publication in NLP/IR academic conferences/journals or industrial circles, such as ACL, EMNLP, NAACL, EACL, COLING, SIGIR, WWW, etc.

About Cambia Health Solutions

