The three levels of NLP for your business
In the past years, the tech world has seen a surge of NLP applications in various areas including adtech, publishing, customer service and market intelligence. According to Gartner’s hype cycle, NLP has reached the peak of inflated expectations in 2018, and many businesses see it as a “go-to” solution to generate value from the 80% of business-relevant data that comes in unstructured form. To put it simply – NLP is wildly adopted with wildly variable success (let’s assume a working definition of success in terms of quality and ROI).
In this article, I share some practical advice for the smooth integration of NLP into the tech stack of your company. The advice summarizes some of the experience I accumulated on my journey with NLP — from academia, through a number of industry projects, and ending up with my own company which develops NLP-driven applications for international market intelligence. The article does not provide technical details but focusses on organisational factors including hiring, communication and expectation management.
Before starting out on NLP, you should meditate on two questions:
1. Is a unique NLP component critical for the core business of our company?
Example: Imagine you are a hosting company. You want to optimise your customer service by analysing incoming customer requests with NLP. Most likely, this enhancement will not be part of your critical path activities. By contrast, a business in targeted advertising should try to make sure it does not fall behind on NLP — this could significantly weaken its competitive position.
2. Do we have the internal competence to develop IP-relevant NLP technology?
Example: You hired and successfully integrated a PhD in Computational Linguistics and can grant her the freedom to design new solutions for your business issues — she will likely be motivated to enrich the IP portfolio of your company. However, if you are hiring middle-level data scientists without a clear focus on language and need to split their time between data science and engineering tasks, don’t expect a unique IP contribution. Most likely, they will fall back on ready-made algorithms due to lack of time and mastery of the underlying details.
Hint 1: if your answers are “yes” and “no” — you are in trouble! You’d better identify technological differentiators that do match your core competence.
Hint 2: if your answers are “yes” and “yes” — stop reading and get to work. Your NLP roadmap should already be defined by your specialists to achieve the business- specific objectives.
If you are still there, don’t worry – the rest will soon fall in place. There are three levels at which you can “do NLP”:
- Black belt level, reaching deep into mathematical and linguistic subtleties
- Training&tuning level, mostly plugging in existing NLP/ML libraries
- Blackbox level, relying on “buying” third-party NLP
Let’s elaborate: the first, fundamental level is our “black belt” — it comes close to computational linguistics, the academic counterpart of NLP. The folks here often split into two camps — the mathematicians and the linguists. The camps might well befriend each other, but the mindsets and the way of doing things will still differ. The math guys are not afraid of things like matrix calculus and will strive on details of newest methods of optimisation and evaluation. At the peril of leaving out linguistic details, they will generally take the lead on improving the recall of your algorithms. The linguists were raised either on highly complex generative or constraint-based grammar formalisms, or alternative frameworks such as cognitive grammar which give more room to imagination but also allow for formal vagueness. They will gravitate towards writing syntactic and semantic rules and compiling lexica, often needing their own sandbox and taking care of the precision part. Depending on how you handle communication and integration between the two camps, their collaboration can either block productivity or open up exciting opportunities.
In general, if you can inject a dose of pragmatism into the academic perfectionism of these folks and make them buy into the idea of serving mortal customers that don’t really have an idea of how NLP works, you can potentially create unique competitive advantage. If you can efficiently combine mathematicians and linguists on your team — even better! But be aware that you have to sell them on an honest vision — and then, follow through. Doing hard fundamental work without seeing its impact on the business would be a frustrating and demotivating experience for your team.
The second level involves training and tuning of models using existing algorithms. In practice, most of the time will be spent on data preparation, training data creation and feature engineering. The core tasks — training and tuning — do not require that much effort. At this level, your people will be data scientists pushing the boundaries of open-source packages for NLP and/or machine learning, such as nltk, scikit-learn, spacy and tensorflow. They will invent new and not always academically justified ways of extending training data, engineering features and applying their intuition for surface-side tweaking. The goal is to train well-understood algorithms such as NER, categorisation and sentiment analysis, customized to the specific data at your company.
The good thing here is that there is plenty of great open-source packages out there that will still leave you enough flexibility to optimize on your specific use case. The risk is on the side of HR — many roads lead to data science. Data scientists are often self-taught and have a rather interdisciplinary background. Thus, they will not always have the innate academic rigour of level 1. As deadlines or budgets tighten, your team might get loose on methods of training and evaluation, thus accumulating significant technical debt.
On the third level is a “blackbox” where you buy NLP. Your developers will mostly consume paid APIs that provide the standard algorithm outputs out-of-the-box, such as Rosette, Semantria and Bitext (cf. this post for an extensive review of existing APIs). Ideally, your data scientists will be working along with business analysts or subject matter experts to get maximal value from the analysed data. For example, if you are doing competitive intelligence, your business analysts will design a model which contains your competitors, their related technologies and products and puts them into relation to each other.
At the blackbox level, make sure you buy NLP only from black belts! With this secured, one of the obvious advantages of outsourcing NLP is that you don’t run into the danger of diluting your technological focus. The risk is a lack of flexibility — with time, your requirements will get more and more specific. The better your integration policy, the higher the risk that your API will stop satisfying your requirements. It is also advisable to invest into manual quality assurance to make sure the API outputs good quality for your specific data and use case.
So, where do you start? Of course, it depends — some practical advice:
Talk to your tech folks about your business objectives, let them research and prototype and start out on level 2 or 3.
Make sure your team doesn’t get stuck in low-level details of level 1 too early — this might lead to significant slips in time and budget since a huge amount of knowledge and training is required.
Don’t hesitate — you can always consider a transition between 2 and 3 further down the path (by the way, this works in any direction). The transition can be efficiently combined with the generally unavoidable refactoring of your system.
If you manage to build up a compelling business case with NLP — welcome to the club, you can use it to attract first-class specialists and add to your uniqueness by working on level 1!
About the author: Janna Lipenkova holds a PhD in Computational Linguistics and is CEO of Anacode, a provider of tech-based solutions for international market intelligence.
Originally published at https://www.linkedin.com/pulse/three-levels-nlp-your-business-janna-lipenkova/