Inductive Learning
Direct 0-1 Loss Minimization and Margin Maximization with Boosting
We propose a boosting method, DirectBoost, a greedy coordinate descent algorithm that builds an ensemble classifier of weak classifiers through directly minimizing empirical classification error over labeled training examples; once the training classification error is reduced to a local coordinatewise minimum, Direct-Boost runs a greedy coordinate ascent algorithm that continuously adds weak classifiers to maximize any targeted arbitrarily defined margins until reaching a local coordinatewise maximum of the margins in a certain sense.
Learning with Invariance via Linear Functionals on Reproducing Kernel Hilbert Space
Incorporating invariance information is important for many learning problems. To exploit invariances, most existing methods resort to approximations that either lead to expensive optimization problems such as semi-definite programming, or rely on separation oracles to retain tractability. Some methods further limit the space of functions and settle for non-convex models. In this paper, we propose a framework for learning in reproducing kernel Hilbert spaces (RKHS) using local invariances that explicitly characterize the behavior of the target function around data instances. These invariances are compactly encoded as linear functionals whose value are penalized by some loss function. Based on a representer theorem that we establish, our formulation can be efficiently optimized via a convex program. For the representer theorem to hold, the linear functionals are required to be bounded in the RKHS, and we show that this is true for a variety of commonly used RKHS and invariances. Experiments on learning with unlabeled data and transform invariances show that the proposed method yields better or similar results compared with the state of the art.
6081594975a764c8e3a691fa2b3a321d-Reviews.html
This paper proposes a new boosting method that represents a tradeoff between online and offline learning. The main idea of the method is to maintain a reservoir of training examples (of fixed size) from which to train the weak learners. At each boosting iteration, new examples are added to the reservoir and then a selection strategy is used to reduce the reservoir to its original fixed size before the weak learner is trained. Several naive selection strategies are proposed but the main contribution of the paper is a more sophisticated selection strategy whose goal is to remove examples from the reservoir so that a weak learner trained on the reduced set will minimize the error computed on the whole set before reduction. The resulting algorithm is applied on four computer vision datasets, where it is shown to outperform several other online boosting methods. The idea of using a reservoir is original and very interesting.
52292e0c763fd027c6eba6b8f494d2eb-Reviews.html
Reviewer response to rebuttal: I have read through the author's rebuttal and I am happy with the proposed changes. I have not changed my review as I already recommended this paper for acceptance. Previous Review: In this work, the authors develop a hierarchical generative model for producing and classifying written characters with the goal of achieving a high level of performance with just one training example. The model is rooted in learning the compositional structure of characters and the causal relationship that dictates how characters are produced. The model is compared to a simpler version of the model that does not represent character strokes, a deep boltzmann machine approach, and a hierarchical deep learning method.
Transfer Learning in a Transductive Setting
Category models for objects or activities typically rely on supervised learning requiring sufficiently large training sets. Transferring knowledge from known categories to novel classes with no or only a few labels is far less researched even though it is a common scenario. In this work, we extend transfer learning with semi-supervised learning to exploit unlabeled instances of (novel) categories with no or only a few labeled instances. Our proposed approach Propagated Semantic Transfer combines three techniques. First, we transfer information from known to novel categories by incorporating external knowledge, such as linguistic or expertspecified information, e.g., by a mid-level layer of semantic attributes.
Altitude Training: Strong Bounds for Single-Layer Dropout Stefan Wager, Sida Wang, and Percy Liang
Dropout training, originally designed for deep neural networks, has been successful on high-dimensional single-layer natural language tasks. This paper proposes a theoretical explanation for this phenomenon: we show that, under a generative Poisson topic model with long documents, dropout training improves the exponent in the generalization bound for empirical risk minimization. Dropout achieves this gain much like a marathon runner who practices at altitude: once a classifier learns to perform reasonably well on training examples that have been artificially corrupted by dropout, it will do very well on the uncorrupted test set. We also show that, under similar conditions, dropout preserves the Bayes decision boundary and should therefore induce minimal bias in high dimensions.
Learning Distributed Representations for Structured Output Prediction
In recent years, distributed representations of inputs have led to performance gains in many applications by allowing statistical information to be shared across inputs. However, the predicted outputs (labels, and more generally structures) are still treated as discrete objects even though outputs are often not discrete units of meaning. In this paper, we present a new formulation for structured prediction where we represent individual labels in a structure as dense vectors and allow semantically similar labels to share parameters. We extend this representation to larger structures by defining compositionality using tensor products to give a natural generalization of standard structured prediction approaches. We define a learning objective for jointly learning the model parameters and the label vectors and propose an alternating minimization algorithm for learning. We show that our formulation outperforms structural SVM baselines in two tasks: multiclass document classification and part-of-speech tagging.
A Representation Theory for Ranking Functions
This paper presents a representation theory for permutation-valued functions, which in their general form can also be called listwise ranking functions. Pointwise ranking functions assign a score to each object independently, without taking into account the other objects under consideration; whereas listwise loss functions evaluate the set of scores assigned to all objects as a whole. In many supervised learning to rank tasks, it might be of interest to use listwise ranking functions instead; in particular, the Bayes Optimal ranking functions might themselves be listwise, especially if the loss function is listwise. A key caveat to using listwise ranking functions has been the lack of an appropriate representation theory for such functions. We show that a natural symmetricity assumption that we call exchangeability allows us to explicitly characterize the set of such exchangeable listwise ranking functions. Our analysis draws from the theories of tensor analysis, functional analysis and De Finetti theorems. We also present experiments using a novel reranking method motivated by our representation theory.
Semi-supervised Learning with Deep Generative Models, Max Welling Machine Learning Group, Univ. of Amsterdam, { D.P.Kingma, M.Welling }@uva.nl
The ever-increasing size of modern data sets combined with the difficulty of obtaining label information has made semi-supervised learning one of the problems of significant practical importance in modern data analysis. We revisit the approach to semi-supervised learning with generative models and develop new models that allow for effective generalisation from small labelled data sets to large unlabelled ones. Generative approaches have thus far been either inflexible, inefficient or non-scalable. We show that deep generative models and approximate Bayesian inference exploiting recent advances in variational methods can be used to provide significant improvements, making generative approaches highly competitive for semi-supervised learning.