Goto

Collaborating Authors

 Computational Learning Theory


Learning and Testing Convex Functions

arXiv.org Artificial Intelligence

We consider the problems of \emph{learning} and \emph{testing} real-valued convex functions over Gaussian space. Despite the extensive study of function convexity across mathematics, statistics, and computer science, its learnability and testability have largely been examined only in discrete or restricted settings -- typically with respect to the Hamming distance, which is ill-suited for real-valued functions. In contrast, we study these problems in high dimensions under the standard Gaussian measure, assuming sample access to the function and a mild smoothness condition, namely Lipschitzness. A smoothness assumption is natural and, in fact, necessary even in one dimension: without it, convexity cannot be inferred from finitely many samples. As our main results, we give: - Learning Convex Functions: An agnostic proper learning algorithm for Lipschitz convex functions that achieves error $\varepsilon$ using $n^{O(1/\varepsilon^2)}$ samples, together with a complementary lower bound of $n^{\mathrm{poly}(1/\varepsilon)}$ samples in the \emph{correlational statistical query (CSQ)} model. - Testing Convex Functions: A tolerant (two-sided) tester for convexity of Lipschitz functions with the same sample complexity (as a corollary of our learning result), and a one-sided tester (which never rejects convex functions) using $O(\sqrt{n}/\varepsilon)^n$ samples.



Is nasty noise actually harder than malicious noise?

arXiv.org Artificial Intelligence

We consider the relative abilities and limitations of computationally efficient algorithms for learning in the presence of noise, under two well-studied and challenging adversarial noise models for learning Boolean functions: malicious noise, in which an adversary can arbitrarily corrupt a random subset of examples given to the learner; and nasty noise, in which an adversary can arbitrarily corrupt an adversarially chosen subset of examples given to the learner. We consider both the distribution-independent and fixed-distribution settings. Our main results highlight a dramatic difference between these two settings: For distribution-independent learning, we prove a strong equivalence between the two noise models: If a class ${\cal C}$ of functions is efficiently learnable in the presence of $ฮท$-rate malicious noise, then it is also efficiently learnable in the presence of $ฮท$-rate nasty noise. In sharp contrast, for the fixed-distribution setting we show an arbitrarily large separation: Under a standard cryptographic assumption, for any arbitrarily large value $r$ there exists a concept class for which there is a ratio of $r$ between the rate $ฮท_{malicious}$ of malicious noise that polynomial-time learning algorithms can tolerate, versus the rate $ฮท_{nasty}$ of nasty noise that such learning algorithms can tolerate. To offset the negative result for the fixed-distribution setting, we define a broad and natural class of algorithms, namely those that ignore contradictory examples (ICE). We show that for these algorithms, malicious noise and nasty noise are equivalent up to a factor of two in the noise rate: Any efficient ICE learner that succeeds with $ฮท$-rate malicious noise can be converted to an efficient learner that succeeds with $ฮท/2$-rate nasty noise. We further show that the above factor of two is necessary, again under a standard cryptographic assumption.


The Probably Approximately Correct Learning Model in Computational Learning Theory

arXiv.org Machine Learning

Leslie Valiant's 1984 paper "A Theory of the Learnable" [Val84], reproduced in this volume, has the unusual distinction of having changed the course of several scientific disciplines. Within theoretical computer science it was one of the key works giving rise to the field now known as computational learning theory, which may loosely be defined as the rigorous study of learning processes and phenomena from the computer science perspective of efficient algorithms and computational complexity. In the decades since the publication of [Val84], computational learning theory has grown into a rich field with strong connections to many other theoretical disciplines such as mathematical probability and statistics, information theory, decision theory and more. Beyond the realm of theory, Valiant's paper and the Probably Approximately Correct (PAC) model which he introduced in it have also had a great impact on the subsequent development of machine learning, a field which has already transformed many aspects of science and human society and seems certain to have an even greater influence in the future. This chapter gives an overview of the Probably Approximately Correct learning model that Valiant introduced in [Val84], explaining some of the major results and directions that the field has taken in the years since that work.


Safe and Optimal Learning from Preferences via Weighted Temporal Logic with Applications in Robotics and Formula 1

arXiv.org Artificial Intelligence

Abstract--Autonomous systems increasingly rely on human feedback to align their behavior, expressed as pairwise comparisons, rankings, or demonstrations. While existing methods can adapt behaviors, they often fail to guarantee safety in safety-critical domains. We propose a safety-guaranteed, optimal, and efficient approach to solve the learning problem from preferences, rankings, or demonstrations using Weighted Signal T emporal Logic (WSTL). WSTL learning problems, when implemented naively, lead to multi-linear constraints in the weights to be learned. By introducing structural pruning and log-transform procedures, we reduce the problem size and recast the problem as a Mixed-Integer Linear Program while preserving safety guarantees. Experiments on robotic navigation and real-world Formula 1 data demonstrate that the method effectively captures nuanced preferences and models complex task objectives. Autonomous systems are increasingly part of our daily lives, from driverless cars in urban navigation to household robots performing domestic chores. Since these systems operate closely alongside humans, learning from human feedback is a natural way to ensure their behaviors align with human desires.


Information Science Principles of Machine Learning: A Causal Chain Meta-Framework Based on Formalized Information Mapping

arXiv.org Artificial Intelligence

This paper addresses the current lack of a unified formal framework in machine learning theory, as well as the absence of robust theoretical foundations for interpretability and ethical safety assurance. We first construct a formal information model, employing sets of well-formed formulas (WFFs) to explicitly define the ontological states and carrier mappings for the core components of machine learning. By introducing learnable and processable predicates, as well as learning and processing functions, we analyze the logical inference and constraint rules underlying causal chains in models, thereby establishing the Machine Learning Theory Meta-Framework (MLT-MF). Building upon this framework, we propose universal definitions for model interpretability and ethical safety, and rigorously prove and validate four key theorems: the equivalence between model interpretability and information existence, the constructive formulation of ethical safety assurance and two types of total variation distance (TVD) upper bounds. This work overcomes the limitations of previous fragmented approaches, providing a unified theoretical foundation from an information science perspective to systematically address the critical challenges currently facing machine learning.


A Fully Polynomial-Time Algorithm for Robustly Learning Halfspaces over the Hypercube

arXiv.org Artificial Intelligence

We give the first fully polynomial-time algorithm for learning halfspaces with respect to the uniform distribution on the hypercube in the presence of contamination, where an adversary may corrupt some fraction of examples and labels arbitrarily. We achieve an error guarantee of $ฮท^{O(1)}+ฮต$ where $ฮท$ is the noise rate. Such a result was not known even in the agnostic setting, where only labels can be adversarially corrupted. All prior work over the last two decades has a superpolynomial dependence in $1/ฮต$ or succeeds only with respect to continuous marginals (such as log-concave densities). Previous analyses rely heavily on various structural properties of continuous distributions such as anti-concentration. Our approach avoids these requirements and makes use of a new algorithm for learning Generalized Linear Models (GLMs) with only a polylogarithmic dependence on the activation function's Lipschitz constant. More generally, our framework shows that supervised learning with respect to discrete distributions is not as difficult as previously thought.


Private Online Learning against an Adaptive Adversary: Realizable and Agnostic Settings

arXiv.org Artificial Intelligence

We revisit the problem of private online learning, in which a learner receives a sequence of $T$ data points and has to respond at each time-step a hypothesis. It is required that the entire stream of output hypotheses should satisfy differential privacy. Prior work of Golowich and Livni [2021] established that every concept class $\mathcal{H}$ with finite Littlestone dimension $d$ is privately online learnable in the realizable setting. In particular, they proposed an algorithm that achieves an $O_{d}(\log T)$ mistake bound against an oblivious adversary. However, their approach yields a suboptimal $\tilde{O}_{d}(\sqrt{T})$ bound against an adaptive adversary. In this work, we present a new algorithm with a mistake bound of $O_{d}(\log T)$ against an adaptive adversary, closing this gap. We further investigate the problem in the agnostic setting, which is more general than the realizable setting as it does not impose any assumptions on the data. We give an algorithm that obtains a sublinear regret of $\tilde{O}_d(\sqrt{T})$ for generic Littlestone classes, demonstrating that they are also privately online learnable in the agnostic setting.


Compressing Chemistry Reveals Functional Groups

arXiv.org Artificial Intelligence

We introduce the first formal large-scale assessment of the utility of traditional chemical functional groups as used in chemical explanations. Our assessment employs a fundamental principle from computational learning theory: a good explanation of data should also compress the data. We introduce an unsupervised learning algorithm based on the Minimum Message Length (MML) principle that searches for substructures that compress around three million biologically relevant molecules. We demonstrate that the discovered substructures contain most human-curated functional groups as well as novel larger patterns with more specific functions. We also run our algorithm on 24 specific bioactivity prediction datasets to discover dataset-specific functional groups. Fingerprints constructed from dataset-specific functional groups are shown to significantly outperform other fingerprint representations, including the MACCS and Morgan fingerprint, when training ridge regression models on bioactivity regression tasks.


Pixi: Unified Software Development and Distribution for Robotics and AI

arXiv.org Artificial Intelligence

The reproducibility crisis in scientific computing constrains robotics research. Existing studies reveal that up to 70% of robotics algorithms cannot be reproduced by independent teams, while many others fail to reach deployment because creating shareable software environments remains prohibitively complex. These challenges stem from fragmented, multi-language, and hardware-software toolchains that lead to dependency hell. We present Pixi, a unified package-management framework that addresses these issues by capturing exact dependency states in project-level lockfiles, ensuring bit-for-bit reproducibility across platforms. Its high-performance SAT solver achieves up to 10x faster dependency resolution than comparable tools, while integration of the conda-forge and PyPI ecosystems removes the need for multiple managers. Adopted in over 5,300 projects since 2023, Pixi reduces setup times from hours to minutes and lowers technical barriers for researchers worldwide. By enabling scalable, reproducible, collaborative research infrastructure, Pixi accelerates progress in robotics and AI.