Goto

Collaborating Authors

 Technical University of Denmark


Deep Learning from Crowds

AAAI Conferences

Over the last few years, deep learning has revolutionized the field of machine learning by dramatically improving the state-of-the-art in various domains. However, as the size of supervised artificial neural networks grows, typically so does the need for larger labeled datasets. Recently, crowdsourcing has established itself as an efficient and cost-effective solution for labeling large sets of data in a scalable manner, but it often requires aggregating labels from multiple noisy contributors with different levels of expertise. In this paper, we address the problem of learning deep neural networks from crowds. We begin by describing an EM algorithm for jointly learning the parameters of the network and the reliabilities of the annotators. Then, a novel general-purpose crowd layer is proposed, which allows us to train deep neural networks end-to-end, directly from the noisy labels of multiple annotators, using only backpropagation. We empirically show that the proposed approach is able to internally capture the reliability and biases of different annotators and achieve new state-of-the-art results for various crowdsourced datasets across different settings, namely classification, regression and sequence labeling.


Indexable Probabilistic Matrix Factorization for Maximum Inner Product Search

AAAI Conferences

The Maximum Inner Product Search (MIPS) problem, prevalent in matrix factorization-based recommender systems, scales linearly with the number of objects to score. Recent work has shown that clever post-processing steps can turn the MIPS problem into a nearest neighbour one, allowing sublinear retrieval time either through Locality Sensitive Hashing or various tree structures that partition the Euclidian space. This work shows that instead of employing post-processing steps, substantially faster retrieval times can be achieved for the same accuracy when inference is not decoupled from the indexing process. By framing matrix factorization to be natively indexable, so that any solution is immediately sublinearly searchable, we use the machinery of Machine Learning to best learn such a solution. We introduce Indexable Probabilistic Matrix Factorization (IPMF) to shift the traditional post-processing complexity into the training phase of the model. Its inference procedure is based on Geodesic Monte Carlo, and adds minimal additional computational cost to standard Monte Carlo methods for matrix factorization. By coupling inference and indexing in this way, we achieve more than a 50% improvement in retrieval time against two state of the art methods, for a given level of accuracy in the recommendations of two large-scale recommender systems.


Tweetin' in the Rain: Exploring Societal-Scale Effects of Weather on Mood

AAAI Conferences

There has been significant recent interest in using the aggregate sentiment from social media sites to understand and predict real-world phenomena. However, the data from social media sites also offers a unique and — so far — unexplored opportunity to study the impact of external factors on aggregate sentiment, at the scale of a society. Using a Twitter-specific sentiment extraction methodology, we the explore patterns of sentiment present in a corpus of over 1.5 billion tweets. We focus primarily on the effect of the weather and time on aggregate sentiment, evaluating how clearly the well-known individual patterns translate into population-wide patterns. Using machine learning techniques on the Twitter corpus correlated with the weather at the time and location of the tweets, we find that aggregate sentiment follows distinct climate, temporal, and seasonal patterns.