The School of Computing and Communications (SCC) within Lancaster University’s Faculty of Science and Technology, is seeking to appoint a Research Associate (RA) to work on research project on Natural Language Processing (NLP) for Canadian Annual Report Extraction (CARE) project. CARE is funded by the Canadian Mitacs (https://www.mitacs.ca/) and HEC Montreal (https://www.hec.ca/en/).

Working together with project partners at HEC Montreal (led by Dr Kim Trottier), and Representatives from Chartered Professional Accountants (CPA) of Canada, the RA will develop novel NLP Python tool with techniques to automatically detect structure and extract from PDF annual reports, assess readability and sentiment. The post is based in Lancaster, UK.

With CARE, users can upload the PDF of an annual report, run the program, and receive an output consisting of a set of disaggregated text files, one for each section of the annual report. These text files can be useful to investors, regulators, and researchers for performing text analysis. For example, an investor may want to examine a specific section that is relevant to their analysis, such as the chairman’s message or the auditor’s report. A regulator may be interested in assessing climate risk disclosure from the MD&A of all Oil and Gas companies. A researcher exploring goodwill impairment could extract and analyse only the notes on impairments for the firms in their sample. The gains from using CARE are not only in terms of transforming PDF files to text format, but also through an ability to process a large sample of annual reports all at once.

While users of CARE can develop and apply their own Natural Language Processing (NLP) algorithms to the disaggregated text files, CARE provides some of the more common metrics for quick analysis. The following information is produced for each text file: readability scores, tone measures, causal language metrics, and word-frequency counts, where the latter (frequency counts) can be tailored to include key words that are relevant to the user.

In addition to being efficient, capital markets should strive to provide a level playing field. Institutional investors are able to create their proprietary, in-house programs that extract and analyse the narrative portion of annual reports. CARE brings this functionality to the rest of market participants, through a simple, open-source tool. By supporting the development of CARE, Canadian regulators are positioning themselves to be early-movers in making annual report narratives accessible to a growing set of digital users among their constituents.

The RA will be part of an internationally recognised centre of expertise for corpus-based natural language processing (UCREL), and will work directly with Dr Mo El-Haj in SCC at Lancaster University and Dr Kim Trottier in the department of Accounting at HEC Montreal. For more details, please see the associated job description and person specification for this position. Potential candidates can also make informal enquiries to Dr Mo El-Haj (Email) and Dr Kim Trottier (Email).

This is a 50% part-time position expected to start in November 2022, and the RA will join on an indefinite contract, however the role remains contingent on external funding, which for this position which for this position ends 30th October 2023.

Lancaster University are committed to family-friendly and flexible working policies on an individual basis. The School is also an Athena Swan Bronze Award holder, driving good employment practice and initiatives to address gender inequalities in Computing higher education and research.


Lancaster University



How to apply:

Please mention NLP People as a source when applying


Tagged as: , , ,