Kolloquium: Reducing Human Involvement in Error Detection (Prof. Dr. Ziawasch Abedjan)

28.06.2019

Das Institut für Informationsverarbeitung lädt zu folgendem Vortrag ein:

Reducing Human Involvement in Error Detection
Prof. Dr. Ziawasch Abedjan
TU Berlin, Leiter der Gruppe "Big Data Management"

Abstract: Data cleaning is one of the most time-consuming and tedious tasks in data-driven tasks. Typically, it entails the identification of erroneous values and their correction. Effective error detection can significantly improve the subsequent correction step. Research in error detection has provided a variety of approaches, most of which require some prior knowledge about the dataset in order to set up and configure the approach with rules, sensitivity thresholds, or other parameters. Often these approaches only cover a certain type of errors. Recently, novel machine learning techniques have been proposed to treat error detection as a classification task. These approaches still require large amounts of training data scaling with the size of the dataset to cover the variety of residing error types inside a dataset. In this talk, I will present our work in progress towards a holistic error detection system that significantly reduces the amount of required labels by leveraging label propagation techniques and meta-learning. In a nutshell, we leverage existing error detection techniques as feature generators. First I discuss how manually configured off-the-shelf error detection techniques can be aggregated and automatically selected. Then I show, how both approaches can be combined and refined for a configuration-free error detection system that only requires about 20 labeled tuples to outperform state-of-the-art techniques.

Zeit 28.06.2019, 11:00 Uhr
Ort: Institut für Informationsverarbeitung 1307, 13. Etage
Appelstrasse 9A