Goto

Collaborating Authors

 Widmer, Christian


End-to-End Annotator Bias Approximation on Crowdsourced Single-Label Sentiment Analysis

arXiv.org Artificial Intelligence

Sentiment analysis is often a crowdsourcing task prone to subjective labels given by many annotators. It is not yet fully understood how the annotation bias of each annotator can be modeled correctly with state-of-the-art methods. However, resolving annotator bias precisely and reliably is the key to understand annotators' labeling behavior and to successfully resolve corresponding individual misconceptions and wrongdoings regarding the annotation task. Our contribution is an explanation and improvement for precise neural end-to-end bias modeling and ground truth estimation, which reduces an undesired mismatch in that regard of the existing state-of-the-art. Classification experiments show that it has potential to improve accuracy in cases where each sample is annotated only by one single annotator. We provide the whole source code publicly and release an own domain-specific sentiment dataset containing 10,000 sentences discussing organic food products. These are crawled from social media and are singly labeled by 10 non-expert annotators.


An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

Neural Information Processing Systems

We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance. Papers published at the Neural Information Processing Systems Conference.


Framework for Multi-task Multiple Kernel Learning and Applications in Genome Analysis

arXiv.org Machine Learning

We present a general regularization-based framework for Multi-task learning (MTL), in which the similarity between tasks can be learned or refined using $\ell_p$-norm Multiple Kernel learning (MKL). Based on this very general formulation (including a general loss function), we derive the corresponding dual formulation using Fenchel duality applied to Hermitian matrices. We show that numerous established MTL methods can be derived as special cases from both, the primal and dual of our formulation. Furthermore, we derive a modern dual-coordinate descend optimization strategy for the hinge-loss variant of our formulation and provide convergence bounds for our algorithm. As a special case, we implement in C++ a fast LibLinear-style solver for $\ell_p$-norm MKL. In the experimental section, we analyze various aspects of our algorithm such as predictive performance and ability to reconstruct task relationships on biologically inspired synthetic data, where we have full control over the underlying ground truth. We also experiment on a new dataset from the domain of computational biology that we collected for the purpose of this paper. It concerns the prediction of transcription start sites (TSS) over nine organisms, which is a crucial task in gene finding. Our solvers including all discussed special cases are made available as open-source software as part of the SHOGUN machine learning toolbox (available at \url{http://shogun.ml}).


Hierarchical Multitask Structured Output Learning for Large-scale Sequence Segmentation

Neural Information Processing Systems

We present a novel regularization-based Multitask Learning (MTL) formulation for Structured Output (SO) prediction for the case of hierarchical task relations. Structured output learning often results in difficult inference problems and requires large amounts of training data to obtain accurate models. We propose to use MTL to exploit information available for related structured output learning tasks by means of hierarchical regularization. Due to the combination of example sets, the cost of training models for structured output prediction can easily become infeasible for real world applications. We thus propose an efficient algorithm based on bundle methods to solve the optimization problems resulting from MTL structured output learning. We demonstrate the performance of our approach on gene finding problems from the application domain of computational biology. We show that 1) our proposed solver achieves much faster convergence than previous methods and 2) that the Hierarchical SO-MTL approach clearly outperforms considered non-MTL methods.


An Empirical Analysis of Domain Adaptation Algorithms for Genomic Sequence Analysis

Neural Information Processing Systems

We study the problem of domain transfer for a supervised classification task in mRNA splicing. We consider a number of recent domain transfer methods from machine learning, including some that are novel, and evaluate them on genomic sequence data from model organisms of varying evolutionary distance. We find that in cases where the organisms are not closely related, the use of domain adaptation methods can help improve classification performance.