Goto

Collaborating Authors

 Computational Learning Theory


Local Regularizers Are Not Transductive Learners

arXiv.org Machine Learning

We partly resolve an open question raised by Asilis et al. (COLT 2024): whether the algorithmic template of local regularization -- an intriguing generalization of explicit regularization, a.k.a. structural risk minimization -- suffices to learn all learnable multiclass problems. Specifically, we provide a negative answer to this question in the transductive model of learning. We exhibit a multiclass classification problem which is learnable in both the transductive and PAC models, yet cannot be learned transductively by any local regularizer. The corresponding hypothesis class, and our proof, are based on principles from cryptographic secret sharing. We outline challenges in extending our negative result to the PAC model, leaving open the tantalizing possibility of a PAC/transductive separation with respect to local regularization.


Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing Systems

This paper studies the problem of learning shffle ideals in the PAC-learning model under some simple distributions such as product distribution, or distributions generated by Markov chains. A shuffle ideal is a particularly simple type of regular language. Given alphabet \Sigma, and a string u (u_1, โ€ฆ, u_L) the prime shuffle ideal generated by u is the set of strings obtained by inserting any strings s_0, โ€ฆ, s_L in u to get s_0 u_1 s_1 โ€ฆ u_L s_L. The shuffle ideal is a more general class of regular languages where instead of a string u we consider strings in a product set U U_1 \times โ€ฆ \times U_L where U_i \subseteq \Sigma and take the union of the prime shuffle ideals generated by strings in U. The problem of learning regular languages has many applications.


On the Computability of Multiclass PAC Learning

arXiv.org Machine Learning

We study the problem of computable multiclass learnability within the Probably Approximately Correct (PAC) learning framework of Valiant (1984). In the recently introduced computable PAC (CPAC) learning framework of Agarwal et al. (2020), both learners and the functions they output are required to be computable. We focus on the case of finite label space and start by proposing a computable version of the Natarajan dimension and showing that it characterizes CPAC learnability in this setting. We further generalize this result by establishing a meta-characterization of CPAC learnability for a certain family of dimensions: computable distinguishers. Distinguishers were defined by Ben-David et al. (1992) as a certain family of embeddings of the label space, with each embedding giving rise to a dimension. It was shown that the finiteness of each such dimension characterizes multiclass PAC learnability for finite label space in the non-computable setting. We show that the corresponding computable dimensions for distinguishers characterize CPAC learning. We conclude our analysis by proving that the DS dimension, which characterizes PAC learnability for infinite label space, cannot be expressed as a distinguisher (even in the case of finite label space).


Review for NeurIPS paper: Experimental design for MRI by greedy policy search

Neural Information Processing Systems

Weaknesses: The main claim of the paper is the hypothesis that the noise in the non-greedy objective's paper is the reason why the greedy method can outperform it. However, I think that the empirical methodology is not strong enough to back this claim up, as the experiments are carried out on a single dataset, using a single network architecture and reported with a single performance metric. I think that the hypothesis would be much more clearly substantiated if the noise in the gradients were shown to be a consistent trend in various setting; I am afraid that, in the current state, the conclusion could be an anecdotical performance of the given setting. In addition, if I'm correct, RL models are prone to unstable training and are generally hard to train well. How can you confidently ensure that this behaviour isn't due to the RL policy not being trained for long enough?


Review for NeurIPS paper: Experimental design for MRI by greedy policy search

Neural Information Processing Systems

Three knowledgeable referees agree that the paper makes a valuable contribution to the MRI acceleration and reconstruction literature. They recognize the soundness of the proposed approach and its compelling experimental validation. I agree with the referees and recommend acceptance.


Efficient Optimal PAC Learning

arXiv.org Artificial Intelligence

Recent advances in the binary classification setting by Hanneke [2016b] and Larsen [2023] have resulted in optimal PAC learners. These learners leverage, respectively, a clever deterministic subsampling scheme and the classic heuristic of bagging Breiman [1996]. Both optimal PAC learners use, as a subroutine, the natural algorithm of empirical risk minimization. Consequently, the computational cost of these optimal PAC learners is tied to that of the empirical risk minimizer algorithm. In this work, we seek to provide an alternative perspective on the computational cost imposed by the link to the empirical risk minimizer algorithm. To this end, we show the existence of an optimal PAC learner, which offers a different tradeoff in terms of the computational cost induced by the empirical risk minimizer.


On characterizing optimal learning trajectories in a class of learning problems

arXiv.org Machine Learning

In this brief paper, we provide a mathematical framework that exploits the relationship between the maximum principle and dynamic programming for characterizing optimal learning trajectories in a class of learning problem, which is related to point estimations for modeling of high-dimensional nonlinear functions. Here, such characterization for the optimal learning trajectories is associated with the solution of an optimal control problem for a weakly-controlled gradient system with small parameters, whose time-evolution is guided by a model training dataset and its perturbed version, while the optimization problem consists of a cost functional that summarizes how to gauge the quality/performance of the estimated model parameters at a certain fixed final time w.r.t. a model validating dataset. Moreover, using a successive Galerkin approximation method, we provide an algorithmic recipe how to construct the corresponding optimal learning trajectories leading to the optimal estimated model parameters for such a class of learning problem.


Blackwell's Approachability with Approximation Algorithms

arXiv.org Artificial Intelligence

We revisit Blackwell's celebrated approachability problem which considers a repeated vector-valued game between a player and an adversary. Motivated by settings in which the action set of the player or adversary (or both) is difficult to optimize over, for instance when it corresponds to the set of all possible solutions to some NP-Hard optimization problem, we ask what can the player guarantee \textit{efficiently}, when only having access to these sets via approximation algorithms with ratios $\alpha_{\mX} \geq 1$ and $ 1 \geq \alpha_{\mY} > 0$, respectively. Assuming the player has monotone preferences, in the sense that he does not prefer a vector-valued loss $\ell_1$ over $\ell_2$ if $\ell_2 \leq \ell_1$, we establish that given a Blackwell instance with an approachable target set $S$, the downward closure of the appropriately-scaled set $\alpha_{\mX}\alpha_{\mY}^{-1}S$ is \textit{efficiently} approachable with optimal rate. In case only the player's or adversary's set is equipped with an approximation algorithm, we give simpler and more efficient algorithms.


Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances

arXiv.org Artificial Intelligence

Sample complexity of bias estimation is a lower bound on the runtime of any bias detection method. Many regulatory frameworks require the bias to be tested for all subgroups, whose number grows exponentially with the number of protected attributes. Unless one wishes to run a bias detection with a doubly-exponential run-time, one should like to have polynomial complexity of bias detection for a single subgroup. At the same time, the reference data may be based on surveys, and thus come with non-trivial uncertainty. Here, we reformulate bias detection as a point-to-subspace problem on the space of measures and show that, for supremum norm, it can be subsampled efficiently. In particular, our probabilistically approximately correct (PAC) results are corroborated by tests on well-known instances.


Networks with Finite VC Dimension: Pro and Contra

arXiv.org Machine Learning

Approximation and learning of classifiers of large data sets by neural networks in terms of high-dimensional geometry and statistical learning theory are investigated. The influence of the VC dimension of sets of input-output functions of networks on approximation capabilities is compared with its influence on consistency in learning from samples of data. It is shown that, whereas finite VC dimension is desirable for uniform convergence of empirical errors, it may not be desirable for approximation of functions drawn from a probability distribution modeling the likelihood that they occur in a given type of application. Based on the concentration-of-measure properties of high dimensional geometry, it is proven that both errors in approximation and empirical errors behave almost deterministically for networks implementing sets of input-output functions with finite VC dimensions in processing large data sets. Practical limitations of the universal approximation property, the trade-offs between the accuracy of approximation and consistency in learning from data, and the influence of depth of networks with ReLU units on their accuracy and consistency are discussed.