Lectures on Language Technology Uppsala, May 26, 2015
The computational linguistics group at Uppsala University is delighted to invite you to an afternoon of public lectures on language technology by leading experts in the field. The lectures will take place in Room 7-0043, English Park Campus, Uppsala University, on the 26th of May according to the schedule below. Attendance is free for anyone interested.
13.15-14.00 Jan Haji?<http://ufal.mff.cuni.cz/jan-hajic>
Institute of Formal and Applied Linguistics
Charles University in Prague Abstract Meaning Representation across Languages
Abstract Meaning Representation is a newly developed formalism for representing meaning, which abstracts from syntax and some other phenomena but is (still) language-dependent. It has been developed by a consortium of mostly U.S. universities, with team members including Martha Palmer, Kevin Knight, Philipp Koehn, Ulf Hermjakob, Kathy McKeown, Nianwen Xue and others. In the talk, the basic facts about the AMR will be presented, and then comparison will be made for Czech and English as carried out in detail on a small 100-sentence corpus; some examples from Chinese-English comparison will also be shown. In addition, AMR will be compared to the deep syntactic representation used in the set of Prague Dependency Treebanks (again, for Czech And English), and observations will be made about the level of abstraction used in these two formalisms. Plans for future studies and possible corpus annotation work in the nearest future will also be mentioned. The work reported has been done primarily by the CLAMR (Cross-Lingual AMR) team led by Martha Palmer of UCB at the Johns Hopkins Summer workshop in 2014<http://www.clsp.jhu.edu/workshops/archive/ws14-summer-workshop>.
14.00-14.45 Lilja Øvrelid<http://www.mn.uio.no/ifi/english/people/aca/liljao/index.html>
Department of Informatics
University of Oslo The Path to Norwegian Dependency Parsing and Beyond
A syntactic treebank constitutes an important language resource in establishing a set of natural language processing tools for a language. For the past decade, dependency analysis has become an increasingly popular form of syntactic analysis and has been claimed to strike a balance between a depth of analysis sufficient for many down-stream applications, as well as providing accuracy and efficiency in parsing with these types of representations. Until recently however, no treebank has been publicly available for Norwegian, hence, the progress in parsing and applications described above has not been possible. In this talk I will present the recently completed Norwegian Dependency Treebank and discuss some aspects of the annotation process with a particular focus on the influence of pre-processing for syntactic annotation. I will then go on to present parsing results for Norwegian and briefly discuss an application of parsing to sentence-based sentiment analysis.
14.45-15.15 Break
15.15-16.00 Jon Dehdari<http://www.dfki.de/lt/staff.php>
Language Technology Lab
DFKI Saarbrücken A Neurophysiologically-Inspired Statistical Language Model
We describe a statistical language model having components that are inspired by macroscopic electrophysiological activities in the brain. These components correspond to important language-relevant event-related potentials measured using electroencephalography. We relate neural signals involved in local- and long-distance grammatical processing, as well as local- and long-distance lexical processing to statistical language models that are scalable, cross-linguistic, and incremental. We develop a novel language model component that unifies n-gram, cache, skip, and trigger language models into a generalized model inspired by the long-distance lexical event-related potential (N400). This component also exhibits some structural similarities with Elman network-based language models (commonly referred to as RNN LMs). The model is trained online, allowing for use with streaming text. We show consistent perplexity improvements over 4-gram modified Kneser-Ney language models for large-scale datasets in English, Arabic, Croatian, and Hungarian.
16.00-16.45 Hercules Dalianis<http://people.dsv.su.se/~hercules/>
DSV
Stockholm University Clinical Text Retrieval – Some Challenges, Methods and Applications Using Swedish Patient Records to Improve Health Care
Health care has many challenges in form of monitoring and predicting adverse events as health care associated infections or adverse drug events. When and how many have occurred, how can one predict them? Electronic patient records contain a waste source of information, both in form of structured information as diagnosis codes, drug codes, lab values, time stamps, etc and unstructured information in form of free text. How can we use this information to construct tools to help hospitals to improve the quality of health care? We will study the noisy clinical texts and how we cope with them, we will look at different classifications and annotation techniques of the texts and finally present some results of our machine learning tools trained on the annotated corpora.
Source: Corpoa List