GermEval

GermEval

GermEval is a series of shared task evaluation campaigns that focus on Natural Language Processing for the German language. The results of the shared tasks will be presented in a joint pre-conference workshop.

Shared Task 1

Hierarchical classification of blurbs

Hierarchical multi-label classification (HMC) of Blurbs is the task of classifying multiple labels for a short descriptive text, where each label is part of an underlying hierarchy of categories. The increasing amount of available digital documents and the need for more and finer grained categories calls for a new, more robust and sophisticated text classification methods. Large datasets often incorporate a hierarchy for which can be used to categorize information of documents on different levels of specificity. The traditional multi-class text classification approach is thoroughly researched, however, with the increase of available data and the necessity of more specific hierarchies and since traditional approaches fail to generalize adequately, the need for more robust and sophisticated classification methods increases.

With this task we aim to foster research within this context. This task is focusing on classifying German books into their respective hierarchically structured writing genres using short advertisement texts (Blurbs) and further meta information such as author, page number, release date, etc.

The shared task is organized by Rami Aly, Steffen Remus, and Chris Biemann from the Language Technology group of the University of Hamburg, Germany. For further information, see the task website.

Shared Task 2

Identification of offensive language

Offensive language is commonly defined as hurtful, derogatory or obscene comments made by one person to another person. Such type of language can be more increasingly found on the web. As a consequence many operators of social media websites no longer manage to manually monitor user posts. Therefore, there is a pressing demand for methods to automatically identify suspicious posts.

This shared task is to initiate and foster research on the identification of offensive content in German language microposts. Offensive comments are to be detected from a set of German tweets. We focus on Twitter since they can be regarded as a prototypical type of micropost.

This task is organized by Manfred Klenner (Universität Zürich), Josef Ruppenhofer (Institut für deutsche Sprache, Mannheim),
Melanie Siegel (Hochschule Darmstadt), Julia Maria Struß (Fachhochschule Potsdam), and Michael Wiegand (Universität Heidelberg). For further information, see the task website.

Shared Task 3

Lemmatization of German web and social media texts

The goal of the shared task is to encourage the developers of NLP applications to adapt their tools and resources to the lemmatization of German Web pages and written German discourse in genres of computer-mediated communication (CMC). Examples for CMC genres are chats, forums, wiki talk pages, tweets, blog comments, social networks, SMS and WhatsApp dialogues. The shared task is a follow-up to the EmpiriST 2015 shared task, which focused on tokenization and POS-tagging. The current task focuses on the next fundamental step in the NLP pipeline.

The task is organized by Natalie Dykes, Stefan Evert, Philipp Heinrich, Besim Kabashi, and Thomas Proisl from FAU Erlangen-Nuremberg. For further information, see the task website.