Goto

Collaborating Authors

 Search


Finding and Listing Front-door Adjustment Sets

arXiv.org Artificial Intelligence

Identifying the effects of new interventions from data is a significant challenge found across a wide range of the empirical sciences. A well-known strategy for identifying such effects is Pearl's front-door (FD) criterion (Pearl, 1995). The definition of the FD criterion is declarative, only allowing one to decide whether a specific set satisfies the criterion. In this paper, we present algorithms for finding and enumerating possible sets satisfying the FD criterion in a given causal diagram. These results are useful in facilitating the practical applications of the FD criterion for causal effects estimation and helping scientists to select estimands with desired properties, e.g., based on cost, feasibility of measurement, or statistical power.


Nest Your Adaptive Algorithm for Parameter-Agnostic Nonconvex Minimax Optimization

arXiv.org Artificial Intelligence

Adaptive algorithms like AdaGrad and AMSGrad are successful in nonconvex optimization owing to their parameter-agnostic ability -- requiring no a priori knowledge about problem-specific parameters nor tuning of learning rates. However, when it comes to nonconvex minimax optimization, direct extensions of such adaptive optimizers without proper time-scale separation may fail to work in practice. We provide such an example proving that the simple combination of Gradient Descent Ascent (GDA) with adaptive stepsizes can diverge if the primal-dual stepsize ratio is not carefully chosen; hence, a fortiori, such adaptive extensions are not parameter-agnostic. To address the issue, we formally introduce a Nested Adaptive framework, NeAda for short, that carries an inner loop for adaptively maximizing the dual variable with controllable stopping criteria and an outer loop for adaptively minimizing the primal variable. Such mechanism can be equipped with off-the-shelf adaptive optimizers and automatically balance the progress in the primal and dual variables. Theoretically, for nonconvex-strongly-concave minimax problems, we show that NeAda can achieve the near-optimal $\tilde{O}(\epsilon^{-2})$ and $\tilde{O}(\epsilon^{-4})$ gradient complexities respectively in the deterministic and stochastic settings, without prior information on the problem's smoothness and strong concavity parameters. To the best of our knowledge, this is the first algorithm that simultaneously achieves near-optimal convergence rates and parameter-agnostic adaptation in the nonconvex minimax setting. Numerically, we further illustrate the robustness of the NeAda family with experiments on simple test functions and a real-world application.


E2R: a Hierarchical-Learning inspired Novelty-Search method to generate diverse repertoires of grasping trajectories

arXiv.org Artificial Intelligence

Robotics grasping refers to the task of making a robotic system pick an object by applying forces and torques on its surface. Despite the recent advances in data-driven approaches, grasping remains an unsolved problem. Most of the works on this task are relying on priors and heavy constraints to avoid the exploration problem. Novelty Search (NS) refers to evolutionary algorithms that replace selection of best performing individuals with selection of the most novel ones. Such methods have already shown promising results on hard exploration problems. In this work, we introduce a new NS-based method that can generate large datasets of grasping trajectories in a platform-agnostic manner. Inspired by the hierarchical learning paradigm, our method decouples approach and prehension to make the behavioral space smoother. Experiments conducted on 3 different robot-gripper setups and on several standard objects shows that our method outperforms state-of-the-art for generating diverse repertoire of grasping trajectories, getting a higher successful run ratio, as well as a better diversity for both approach and prehension. Some of the generated solutions have been successfully deployed on a real robot, showing the exploitability of the obtained repertoires.


Multi-step Planning for Automated Hyperparameter Optimization with OptFormer

#artificialintelligence

Unlike myopic HPO methods, planning based approaches fundamentally require building models of the future to assess the impact of a current decision on later timesteps. Though these methods also rely on a GP as a surrogate model, each point in multi-step planning involves fantasizing/imagining an updated GP posterior ( ft 1 xt),โ€ฆ,( ft h xt, xt 1,โ€ฆ, xt h 1) based on simulated choices from lookaheads {( xt, yt),โ€ฆ,( xt h 1, yt h 1)} (Lam et al., 2016; Jiang et al., 2020). Note that we use xt to represent a fantasized decision, while xt is the actual choice made at timestep t. Whilst multi-step planning is promising, constructing the posterior of a GP model requires matrix inversion which is a compute-intensive operation (Cormen et al., 2022). Even outside of this limitation, traditional planning based approaches are compute intensive due to (i) poor scaling behavior of the search tree--O(qh) where q is the number of choices at each decision point for each lookahead step (Lam et al., 2016; Lam and Willcox, 2017)--which forces most methods to explore short horizons, typically h {1,2}, and (ii) nested expectation and maximization: marginalizing future observation yt j,j h and global search on the acquisition function to obtain query xt j at every lookahead step.


Searching for Better Database Queries in the Outputs of Semantic Parsers

arXiv.org Artificial Intelligence

The task of generating a database query from a question in natural language suffers from ambiguity and insufficiently precise description of the goal. The problem is amplified when the system needs to generalize to databases unseen at training. In this paper, we consider the case when, at the test time, the system has access to an external criterion that evaluates the generated queries. The criterion can vary from checking that a query executes without errors to verifying the query on a set of tests. In this setting, we augment neural autoregressive models with a search algorithm that looks for a query satisfying the criterion. We apply our approach to the state-of-the-art semantic parsers and report that it allows us to find many queries passing all the tests on different datasets.


Improved Generalization Bound and Learning of Sparsity Patterns for Data-Driven Low-Rank Approximation

arXiv.org Artificial Intelligence

Learning sketching matrices for fast and accurate low-rank approximation (LRA) has gained increasing attention. Recently, Bartlett, Indyk, and Wagner (COLT 2022) presented a generalization bound for the learning-based LRA. Specifically, for rank-$k$ approximation using an $m \times n$ learned sketching matrix with $s$ non-zeros in each column, they proved an $\tilde{\mathrm{O}}(nsm)$ bound on the \emph{fat shattering dimension} ($\tilde{\mathrm{O}}$ hides logarithmic factors). We build on their work and make two contributions. 1. We present a better $\tilde{\mathrm{O}}(nsk)$ bound ($k \le m$). En route to obtaining this result, we give a low-complexity \emph{Goldberg--Jerrum algorithm} for computing pseudo-inverse matrices, which would be of independent interest. 2. We alleviate an assumption of the previous study that sketching matrices have a fixed sparsity pattern. We prove that learning positions of non-zeros increases the fat shattering dimension only by ${\mathrm{O}}(ns\log n)$. In addition, experiments confirm the practical benefit of learning sparsity patterns.


Learning to branch with Tree MDPs

arXiv.org Artificial Intelligence

State-of-the-art Mixed Integer Linear Program (MILP) solvers combine systematic tree search with a plethora of hard-coded heuristics, such as the branching rule. The idea of learning branching rules from data has received increasing attention recently, and promising results have been obtained by learning fast approximations of the strong branching expert. In this work, we instead propose to learn branching rules from scratch via Reinforcement Learning (RL). We revisit the work of Etheve et al. (2020) and propose tree Markov Decision Processes, or tree MDPs, a generalization of temporal MDPs that provides a more suitable framework for learning to branch. We derive a tree policy gradient theorem, which exhibits a better credit assignment compared to its temporal counterpart. We demonstrate through computational experiments that tree MDPs improve the learning convergence, and offer a promising framework for tackling the learning-to-branch problem in MILPs.


BLOX: Macro Neural Architecture Search Benchmark and Algorithms

arXiv.org Artificial Intelligence

Neural architecture search (NAS) has been successfully used to design numerous high-performance neural networks. However, NAS is typically compute-intensive, so most existing approaches restrict the search to decide the operations and topological structure of a single block only, then the same block is stacked repeatedly to form an end-to-end model. Although such an approach reduces the size of search space, recent studies show that a macro search space, which allows blocks in a model to be different, can lead to better performance. To provide a systematic study of the performance of NAS algorithms on a macro search space, we release Blox - a benchmark that consists of 91k unique models trained on the CIFAR-100 dataset. The dataset also includes runtime measurements of all the models on a diverse set of hardware platforms. We perform extensive experiments to compare existing algorithms that are well studied on cell-based search spaces, with the emerging blockwise approaches that aim to make NAS scalable to much larger macro search spaces.


Online Minimax Multiobjective Optimization: Multicalibeating and Other Applications

arXiv.org Artificial Intelligence

We introduce a simple but general online learning framework in which a learner plays against an adversary in a vector-valued game that changes every round. Even though the learner's objective is not convex-concave (and so the minimax theorem does not apply), we give a simple algorithm that can compete with the setting in which the adversary must announce their action first, with optimally diminishing regret. We demonstrate the power of our framework by using it to (re)derive optimal bounds and efficient algorithms across a variety of domains, ranging from multicalibration to a large set of no regret algorithms, to a variant of Blackwell's approachability theorem for polytopes with fast convergence rates. As a new application, we show how to ``(multi)calibeat'' an arbitrary collection of forecasters -- achieving an exponentially improved dependence on the number of models we are competing against, compared to prior work.


ripgrep is faster than {grep, ag, git grep, ucg, pt, sift} - Andrew Gallant's Blog

#artificialintelligence

In this article I will introduce a new command line search tool, ripgrep, that combines the usability of The Silver Searcher (an ack clone) with the raw performance of GNU grep. We will attempt to do the impossible: a fair benchmark comparison between several popular code search tools. As someone who has worked on text search in Rust in their free time for the last 2.5 years, and as the author of both ripgrep and the underlying regular expression engine, I will use this opportunity to provide detailed insights into the performance of each code search tool. No benchmark will go unscrutinized! NOTE: I'm hearing reports from some people that rg isn't as fast as I've claimed on their data. I'd love to help explain what's going on, but to do that, I'll need to be able to reproduce your results. If you file an issue with something I can reproduce, I'd be happy to try and explain it. Why should you use ripgrep over any other search tool? In other words, use ripgrep if you like speed, filtering by default, fewer bugs and Unicode support. I'd like to try to convince you why you shouldn't use ripgrep. Often, this is far more revealing than reasons why I think you should use ripgrep. Despite initially not wanting to add every feature under the sun to ripgrep, over time, ripgrep has grown support for most features found in other file searching tools. This includes searching for results spanning across multiple lines, and opt-in support for PCRE2, which provides look-around and backreference support. The binary name for ripgrep is rg. Binaries for ripgrep are available for Windows, Mac and Linux. Linux binaries are static executables. Windows binaries are available either as built with MinGW (GNU) or with Microsoft Visual C (MSVC). When possible, prefer MSVC over GNU, but you'll need to have the Microsoft VC 2015 redistributable installed. If you're a Homebrew user, then you can install it like so: If you're an Archlinux user, then you can install ripgrep from the official repos: If you're a Rust programmer, ripgrep can be installed with cargo: If you'd like to build ripgrep from source, that is also easy to do. If you have a Rust nightly compiler, then you can enable optional SIMD acceleration like so, which is used in all benchmarks reported in this article. The command line usage of ripgrep doesn't differ much from other tools that perform a similar function, so you probably already know how to use ripgrep. The full details can be found in rg --help, but let's go on a whirlwind tour. Coloring works on Windows too! Colors can be controlled more granularly with the --color flag. One last thing before we get started: generally speaking, ripgrep assumes the input is reading is UTF-8. However, if ripgrep notices a file is encoded as UTF-16, then it will know how to search it. For other encodings, you'll need to explicitly specify them with the -E/--encoding flag. To recursively search the current directory, while respecting all .gitignore