The Search Platform team at Wikimedia Foundation (WMF) is seeking an expert consultant to partner with us in investigating and implementing Natural Language Processing (NLP) as part of our query analysis pipeline, and we are soliciting proposals to add some specific components focusing on two main NLP areas:

Analyzing spelling mistakes people might be making when querying and providing results based on corrected spelling errors, and;
Improving our “Did You Mean” suggestions that provide search options similar to the determined query intent when there are no or few results.
Our current query analysis pipeline utilizes a set of algorithms to break down queries into tokens that are then further processed to determine, as best we can, the intent of the query so we can provide the most relevant and best ranked results, across almost 300 languages. This is aided by a machine learning component in the form of a learning to-rank plugin for Elasticsearch for the top 19 languages (by search volume). Adding NLP to our analysis chain will help us achieve greater search satisfaction, and this project will be laying the foundations for more NLP work in the future.


Wikimedia Foundation


Project Requirements:

Ideally, the NLP work for this project should be either developed as a PHP module or encapsulated in its own Elasticsearch plugin, which will be incorporated into the pipeline with the help of the Search Platform engineering staff. We are open to other ideas as well, as long as we see a path to incorporating this work into the query analysis pipeline we currently run. The specific programming language(s) can be Java and/or PHP, and possibly Python if there is a need, and we would be willing to entertain other options, as long as we can safely incorporate them into our Elasticsearch-centered ecosystem of components. To be clear, however, Elasticsearch experience is not required, as we can help with any integrations required on that end.

Most importantly, respondents should have previous experience applying NLP to search and/or spelling correction. Beyond that, it would be great if you have experience with Elasticsearch, and with building testing and analysis components to help determine the effectiveness of query results as NLP techniques are applied. In your proposal, please indicate your prior experience and briefly summarize how you plan to approach this work, including your preferred programming language(s) and any expectations you have about the infrastructure required to support your direction.

We are fiercely dedicated to open source software at WMF, and all work completed needs to be made available under open source licensing. No closed source, proprietary solutions will be considered.

How to apply:

Please mention NLP People as a source when applying


Tagged as: , ,

About Wikimedia Foundation

The Wikimedia Foundation encourages the development and distribution of freeeducational content with projects such as Wikipedia.