Computational Learning Theory
What's coming up at #ICML2021?
The thirty eighth International Conference on Machine Learning (ICML) is now underway and will run for the entirety of this week (18 – 24 July), in a virtual only format. There will five invited talks to enjoy, as well as workshops, tutorials, affinity events and socials. Challenges in Deploying and monitoring Machine Learning Systems INNF: Invertible Neural Networks, Normalizing Flows, and Explicit Likelihood Models ICML Workshop on Theoretic Foundation, Criticism, and Application Trend of Explainable AI Tackling Climate Change with Machine Learning Theory and Foundation of Continual Learning ICML 2021 Workshop on Unsupervised Reinforcement Learning Human-AI Collaboration in Sequential Decision-Making ICML Workshop on Representation Learning for Finance and E-Commerce Applications Reinforcement Learning for Real Life Uncertainty and Robustness in Deep Learning Interpretable Machine Learning in Healthcare 8th ICML Workshop on Automated Machine Learning (AutoML 2021) Theory and Practice of Differential Privacy The Neglected Assumptions In Causal Inference Machine Learning for Data: Automated Creation, Privacy, Bias ICML Workshop on Human in the Loop Learning (HILL) ICML Workshop on Algorithmic Recourse A Blessing in Disguise: The Prospects and Perils of Adversarial Machine Learning International Workshop on Federated Learning for User Privacy and Data Confidentiality in Conjunction with ICML 2021 (FL-ICML'21) Workshop on Socially Responsible Machine Learning ICML 2021 Workshop on Computational Biology Subset Selection in Machine Learning: From Theory to Applications Workshop on Computational Approaches to Mental Health @ ICML 2021 Workshop on Distribution-Free Uncertainty Quantification Information-Theoretic Methods for Rigorous, Responsible, and Reliable Machine Learning (ITR3) Beyond first-order methods in machine learning systems Self-Supervised Learning for Reasoning and Perception Time Series Workshop Workshop on Reinforcement Learning Theory Over-parameterization: Pitfalls and Opportunities
Transformer-based Machine Learning for Fast SAT Solvers and Logic Synthesis
Shi, Feng, Lee, Chonghan, Bashar, Mohammad Khairul, Shukla, Nikhil, Zhu, Song-Chun, Narayanan, Vijaykrishnan
CNF-based SAT and MaxSAT solvers are central to logic synthesis and verification systems. The increasing popularity of these constraint problems in electronic design automation encourages studies on different SAT problems and their properties for further computational efficiency. There has been both theoretical and practical success of modern Conflict-driven clause learning SAT solvers, which allows solving very large industrial instances in a relatively short amount of time. Recently, machine learning approaches provide a new dimension to solving this challenging problem. Neural symbolic models could serve as generic solvers that can be specialized for specific domains based on data without any changes to the structure of the model. In this work, we propose a one-shot model derived from the Transformer architecture to solve the MaxSAT problem, which is the optimization version of SAT where the goal is to satisfy the maximum number of clauses. Our model has a scale-free structure which could process varying size of instances. We use meta-path and self-attention mechanism to capture interactions among homogeneous nodes. We adopt cross-attention mechanisms on the bipartite graph to capture interactions among heterogeneous nodes. We further apply an iterative algorithm to our model to satisfy additional clauses, enabling a solution approaching that of an exact-SAT problem. The attention mechanisms leverage the parallelism for speedup. Our evaluation indicates improved speedup compared to heuristic approaches and improved completion rate compared to machine learning approaches.
Forster Decomposition and Learning Halfspaces with Noise
Diakonikolas, Ilias, Kane, Daniel M., Tzamos, Christos
The motivating application for this paper is the problem of(distribution-independent) PAC learning of halfspaces in the presence of label noise, and more specifically in the Massart (or bounded noise) model. Recent work [DGT19] obtained the first computationally efficient learning algorithm with non-trivial error guarantee for this problem. Interestingly, the sample complexity of the [DGT19] algorithm scales polynomially with the bit complexity of the examples (in addition, of course, to the dimension and the inverse of desired accuracy). This bit-complexity dependence in the sample complexity is an artifact of the algorithmic approach in [DGT19]. Information-theoretically, no such dependence is needed -- alas, the standard VC-dimension-based sample upper bound [MN06] is non-constructive. Motivated by this qualitative gap in our understanding, here we develop a methodology that leads to a computationally efficient learning algorithm for Massart halfspaces (matching the error guarantee of [DGT19]) with "strongly polynomial" sample complexity, i.e., sample complexity completely independent of the bit complexity of the examples. Halfspaces and Efficient Learnability We study the binary classification setting, where the goal is to learn a Boolean function from random labeled examples with noisy labels. Our focus is on the problem of learning halfspaces in Valiant's PAC learning model [Val84] when the labels have been corrupted by Massart noise [MN06].
Minimum Constraint Removal Problem for Line Segments is NP-hard
One of the most important objectives in motion planning is finding a feasible path from the starting point to a goal without collision with obstacles. The obstacles are either closed doors, which can be opened and removed by the robot, or are obstacles that cannot be passed which can be ignored by the robot with a penalty. Usually, there is no feasible path for some navigation. Recently, some researchers have focused on finding a path for the robot by minimizing the number of removed obstacles. For instance, in Stilman and Kuffner's paper [Stilman and Kuffner, 2005], the robot is able to move the obstacles around and clear its movement space.
Evidence for Long-Tails in SLS Algorithms
Wörz, Florian, Lorenz, Jan-Hendrik
Stochastic local search (SLS) is a successful paradigm for solving the satisfiability problem of propositional logic. A recent development in this area involves solving not the original instance, but a modified, yet logically equivalent one. Empirically, this technique was found to be promising as it improves the performance of state-of-the-art SLS solvers. Currently, there is only a shallow understanding of how this modification technique affects the runtimes of SLS solvers. Thus, we model this modification process and conduct an empirical analysis of the hardness of logically equivalent formulas. Our results are twofold. First, if the modification process is treated as a random process, a lognormal distribution perfectly characterizes the hardness; implying that the hardness is long-tailed. This means that the modification technique can be further improved by implementing an additional restart mechanism. Thus, as a second contribution, we theoretically prove that all algorithms exhibiting this long-tail property can be further improved by restarts. Consequently, all SAT solvers employing this modification technique can be enhanced.
Conditional Teaching Size
Garcia-Piqueras, Manuel, Hernández-Orallo, José
Recent research in machine teaching has explored the instruction of any concept expressed in a universal language. In this compositional context, new experimental results have shown that there exist data teaching sets surprisingly shorter than the concept description itself. However, there exists a bound for those remarkable experimental findings through teaching size and concept complexity that we further explore here. As concepts are rarely taught in isolation we investigate the best configuration of concepts to teach a given set of concepts, where those that have been acquired first can be reused for the description of new ones. This new notion of conditional teaching size uncovers new insights, such as the interposition phenomenon: certain prior knowledge generates simpler compatible concepts that increase the teaching size of the concept that we want to teach. This does not happen for conditional Kolmogorov complexity. Furthermore, we provide an algorithm that constructs optimal curricula based on interposition avoidance. This paper presents a series of theoretical results, including their proofs, and some directions for future work. New research possibilities in curriculum teaching in compositional scenarios are now wide open to exploration.
Exponential Weights Algorithms for Selective Learning
Qiao, Mingda, Valiant, Gregory
We study the selective learning problem introduced by Qiao and Valiant (2019), in which the learner observes $n$ labeled data points one at a time. At a time of its choosing, the learner selects a window length $w$ and a model $\hat\ell$ from the model class $\mathcal{L}$, and then labels the next $w$ data points using $\hat\ell$. The excess risk incurred by the learner is defined as the difference between the average loss of $\hat\ell$ over those $w$ data points and the smallest possible average loss among all models in $\mathcal{L}$ over those $w$ data points. We give an improved algorithm, termed the hybrid exponential weights algorithm, that achieves an expected excess risk of $O((\log\log|\mathcal{L}| + \log\log n)/\log n)$. This result gives a doubly exponential improvement in the dependence on $|\mathcal{L}|$ over the best known bound of $O(\sqrt{|\mathcal{L}|/\log n})$. We complement the positive result with an almost matching lower bound, which suggests the worst-case optimality of the algorithm. We also study a more restrictive family of learning algorithms that are bounded-recall in the sense that when a prediction window of length $w$ is chosen, the learner's decision only depends on the most recent $w$ data points. We analyze an exponential weights variant of the ERM algorithm in Qiao and Valiant (2019). This new algorithm achieves an expected excess risk of $O(\sqrt{\log |\mathcal{L}|/\log n})$, which is shown to be nearly optimal among all bounded-recall learners. Our analysis builds on a generalized version of the selective mean prediction problem in Drucker (2013); Qiao and Valiant (2019), which may be of independent interest.
Goal-Aware Neural SAT Solver
Ozolins, Emils, Freivalds, Karlis, Draguns, Andis, Gaile, Eliza, Zakovskis, Ronalds, Kozlovics, Sergejs
Modern neural networks obtain information about the problem and calculate the output solely from the input values. We argue that it is not always optimal, and the network's performance can be significantly improved by augmenting it with a query mechanism that allows the network to make several solution trials at run time and get feedback on the loss value on each trial. To demonstrate the capabilities of the query mechanism, we formulate an unsupervised (not dependant on labels) loss function for Boolean Satisfiability Problem (SAT) and theoretically show that it allows the network to extract rich information about the problem. We then propose a neural SAT solver with a query mechanism called QuerySAT and show that it outperforms the neural baseline on a wide range of SAT tasks and the classical baselines on SHA-1 preimage attack and 3-SAT task.
Boosting in the Presence of Massart Noise
Diakonikolas, Ilias, Impagliazzo, Russell, Kane, Daniel, Lei, Rex, Sorrell, Jessica, Tzamos, Christos
We study the problem of boosting the accuracy of a weak learner in the (distribution-independent) PAC model with Massart noise. In the Massart noise model, the label of each example $x$ is independently misclassified with probability $\eta(x) \leq \eta$, where $\eta<1/2$. The Massart model lies between the random classification noise model and the agnostic model. Our main positive result is the first computationally efficient boosting algorithm in the presence of Massart noise that achieves misclassification error arbitrarily close to $\eta$. Prior to our work, no non-trivial booster was known in this setting. Moreover, we show that this error upper bound is best possible for polynomial-time black-box boosters, under standard cryptographic assumptions. Our upper and lower bounds characterize the complexity of boosting in the distribution-independent PAC model with Massart noise. As a simple application of our positive result, we give the first efficient Massart learner for unions of high-dimensional rectangles.
A Simple and General Debiased Machine Learning Theorem with Finite Sample Guarantees
Chernozhukov, Victor, Newey, Whitney K., Singh, Rahul
Debiased machine learning is a meta algorithm based on bias correction and sample splitting to calculate confidence intervals for functionals (i.e. scalar summaries) of machine learning algorithms. For example, an analyst may desire the confidence interval for a treatment effect estimated with a neural network. We provide a nonasymptotic debiased machine learning theorem that encompasses any global or local functional of any machine learning algorithm that satisfies a few simple, interpretable conditions. Formally, we prove consistency, Gaussian approximation, and semiparametric efficiency by finite sample arguments. The rate of convergence is root-n for global functionals, and it degrades gracefully for local functionals. Our results culminate in a simple set of conditions that an analyst can use to translate modern learning theory rates into traditional statistical inference. The conditions reveal a new double robustness property for ill posed inverse problems.