Goto

Collaborating Authors

 Computational Learning Theory


Optimistic Rates for Multi-Task Representation Learning

Neural Information Processing Systems

We study the problem of transfer learning via Multi-Task Representation Learning (MTRL), wherein multiple source tasks are used to learn a good common representation, and a predictor is trained on top of it for the target task. Under standard regularity assumptions on the loss function and task diversity, we provide new statistical rates on the excess risk of the target task, which demonstrate the benefit of representation learning. Importantly, our rates are optimistic, i.e., they interpolate between the standard O(m


Optimistic Rates for Multi-Task Representation Learning

Neural Information Processing Systems

We study the problem of transfer learning via Multi-Task Representation Learning (MTRL), wherein multiple source tasks are used to learn a good common representation, and a predictor is trained on top of it for the target task. Under standard regularity assumptions on the loss function and task diversity, we provide new statistical rates on the excess risk of the target task, which demonstrate the benefit of representation learning. Importantly, our rates are optimistic, i.e., they interpolate between the standard O(m




Multiclass Boosting: Simple and Intuitive Weak Learning Criteria Amit Daniely Department of Computer Science Department of Computer Science Princeton University

Neural Information Processing Systems

We study a generalization of boosting to the multiclass setting. We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being "slightly better than random guessing". We give a simple and efficient boosting algorithm, that does not require realizability assumptions and its sample and oracle complexity bounds are independent of the number of classes. In addition, we utilize our new boosting technique in several theoretical applications within the context of List PAC Learning. First, we establish an equivalence to weak PAC learning. Furthermore, we present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning and List PAC learning. Notably, our technique gives rise to a simplified analysis, and also implies an improved error bound for large list sizes, compared to previous results.


Multiclass Boosting: Simple and Intuitive Weak Learning Criteria Amit Daniely Department of Computer Science Department of Computer Science Princeton University

Neural Information Processing Systems

We study a generalization of boosting to the multiclass setting. We introduce a weak learning condition for multiclass classification that captures the original notion of weak learnability as being "slightly better than random guessing". We give a simple and efficient boosting algorithm, that does not require realizability assumptions and its sample and oracle complexity bounds are independent of the number of classes. In addition, we utilize our new boosting technique in several theoretical applications within the context of List PAC Learning. First, we establish an equivalence to weak PAC learning. Furthermore, we present a new result on boosting for list learners, as well as provide a novel proof for the characterization of multiclass PAC learning and List PAC learning. Notably, our technique gives rise to a simplified analysis, and also implies an improved error bound for large list sizes, compared to previous results.


Mistake Bounds for Binary Matrix Completion

Neural Information Processing Systems

We study the problem of completing a binary matrix in an online learning setting. On each trial we predict a matrix entry and then receive the true entry. We propose a Matrix Exponentiated Gradient algorithm [1] to solve this problem. We provide a mistake bound for the algorithm, which scales with the margin complexity [2, 3] of the underlying matrix. The bound suggests an interpretation where each row of the matrix is a prediction task over a finite set of objects, the columns. Using this we show that the algorithm makes a number of mistakes which is comparable up to a logarithmic factor to the number of mistakes made by the Kernel Perceptron with an optimal kernel in hindsight. We discuss applications of the algorithm to predicting as well as the best biclustering and to the problem of predicting the labeling of a graph without knowing the graph in advance.


Reviews: Interaction Screening: Efficient and Sample-Optimal Learning of Ising Models

Neural Information Processing Systems

The main effort is to improve upon the recent results provided by Bresler, showing that the complexity of identifying the structure of max degree d Ising model is polynomial in p and independent of d. Strong Points: 1) The timeliness of the topic in this paper is good, meaning that there is currently ongoing interest and work on Ising model reconstruction. Weak points: 1) The whole approach is based on the introduction of the ISO. This is the main trick in the proposed approach. Other than that, the rest of the method and its analysis are usual and well studied (l_1-penalization and connection with the tutorial by Negahban et.



On the Recursive Teaching Dimension of VC Classes

Neural Information Processing Systems

In this paper, we study the quantitative relation between RTD and the well-known learning complexity measure VC dimension (VCD), and improve the best known upper and (worst-case) lower bounds on the recursive teaching dimension with respect to the VC dimension.