We are looking for a Ph.D. student interested in doing a combination of
machine learning and information retrieval research. Given a large collection
of technical documents, we will apply machine learning techniques to construct
associations between the formulae and words used to explain mathematical
ideas, and determine how to translate automatically between those two forms of
expression. These text-math relationships will then be used to produce a
scalable math-aware search engine for a large public collection of technical
documents (CiteSeerX: http://citeseerx.ist.psu.edu/index).

These associations and translations between math and text can then be used by
students who write what they are looking for using words, with the search
engine finding documents that express those same ideas, even if only in
mathematical notation. This project also aims to allow scientists, engineers,
mathematicians, and students to locate technical information using words,
mathematical notation, or some of each. For example, a mathematician studying
graph theory could use these new capabilities to find related applications in
physics, ecology, and social network analysis, despite any differences in the
notation and terminology used in those disciplines.

The Lab (www.cs.rit.edu/~dprl):

The Document and Pattern Recognition Lab at RIT has developed state-of-the-art
techniques for formula search, math-aware search interfaces and automatic
recognition of handwritten and typeset mathematical notation. There are
currently two other Ph.D. students working on this project, along with a number
of Master’s-level Research Assistants.

This Ph.D. is part of a larger project funded by the NSF and the Alfred P. Sloan
Foundation aiming to bring math search to the masses, through developing
usable math-aware search engines that make the location and discovery of
technical information simpler for both experts and non-experts.


