Goto

Collaborating Authors

 Search


Reviews: Efficient Forward Architecture Search

Neural Information Processing Systems

This paper proposes novel neural architecture search method dubbed Petridish which is based on gradient boosting of "weak learners" (i.e. Originality: The main contribution of the paper is applying basic ideas from gradient-boosting of weak learners to the task of neural architecture search. This is an original idea, which allows a more guided exploration of the space of neural architectures compared to the random steps done, e.g. in evolutionary algorithms. Most related work is adequately discussed. The connection/differences to NAS methods combining network morphisms with evolutionary algorithms should be discussed in more detail as these explore the search space based on similar steps (modifying a model by small incremental additions) but select steps randomly and not based on gradient boosting.


Reviews: Fast AutoAugment

Neural Information Processing Systems

While I feel that the new random baselines significantly strengthen the paper's results on CIFAR-100, random baselines are not provided for CIFAR-10, SVHN, or ImageNet. I've updated my score from a 6 to an 7, based on the random baselines for CIFAR-100 and the authors' promise to clarify their evaluation measure in the final submission. However, Cubuk et al.'s original algorithm is extremely resource-intensive. The main contribution of this paper is an algorithm that can operate on the same search space and come up with data augmentation schemes orders of magnitude more efficiently. The most closely related work I'm aware of is Population Based Augmentation (ICML 2019), which tries to solve the same problem in a different way.


Reviews: Fast AutoAugment

Neural Information Processing Systems

This paper is concerned with automating the search for data augmentation transformations for image classification with DNN models. It does so in a way that avoids having to re-train (or fine-tune) the model for every transformation scored. This leads to a method which, compared to previous SotA (AutoAugment), is very much faster, but is shown to provide results of similar quality. While both this work and AutoAugment use a carefully choosen search space, for which neither is strongly outperforming random search over this space, the dramatic reduction in resource need over AutoAugment justifies its publication. However, the authors are asked provide further results in the final version, in particular a more thorough comparison against random search baselines with the same advanced search space, also including random repetitions, in order to convince readers their method improves enough over random search in order to justify its added complexity.


Review for NeurIPS paper: Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Neural Information Processing Systems

Summary and Contributions: This paper considers the problem of learning to do combinatorial optimization on graphs. In particular, it focuses on a set of constrained minimization problems: given an objective function and a constraint, find a set of nodes minimizing the objective function subject to the constraint. This is a broad family that includes many NP-hard problems. The goal is to train a neural network such that, when given a new instance of one of these problems, it can efficiently compute a solution satisfying the constraints whose cost is "close" to the optimal cost. This work proposes a novel framework for unsupervised combinatorial optimization on graphs, which is inspired by Erdos's probabilistic proof technique and makes it more likely that the outputs produce a valid solution when compared to previous approaches.


Review for NeurIPS paper: Erdos Goes Neural: an Unsupervised Learning Framework for Combinatorial Optimization on Graphs

Neural Information Processing Systems

This is a surprising, novel, and principled framework for unsupervised ML-based combinatorial optimization. The paper should be improved following suggestions discussed in the rebuttal, but this should be straightforward. Overall, I'll quote R2: "Unlike many recent papers in this space which are rather incremental in combining GNN with reinforcement learning in various ways, this paper proposes a fresh, fundamentally new perspective."


Reviews: Search on the Replay Buffer: Bridging Planning and Reinforcement Learning

Neural Information Processing Systems

Compact search spaces would confer computational benefits if nothing else. Overall, studying how compact representations of the state might might compare when used inside graph search seems like a nice way to evaluate just how much utility is added by the distributional RL component of the overall approach.


Decision Making in Changing Environments: Robustness, Query-Based Learning, and Differential Privacy

arXiv.org Machine Learning

We study the problem of interactive decision making in which the underlying environment changes over time subject to given constraints. We propose a framework, which we call \textit{hybrid Decision Making with Structured Observations} (hybrid DMSO), that provides an interpolation between the stochastic and adversarial settings of decision making. Within this framework, we can analyze local differentially private (LDP) decision making, query-based learning (in particular, SQ learning), and robust and smooth decision making under the same umbrella, deriving upper and lower bounds based on variants of the Decision-Estimation Coefficient (DEC). We further establish strong connections between the DEC's behavior, the SQ dimension, local minimax complexity, learnability, and joint differential privacy. To showcase the framework's power, we provide new results for contextual bandits under the LDP constraint.


Data-efficient Performance Modeling via Pre-training

arXiv.org Artificial Intelligence

Performance models are essential for automatic code optimization, enabling compilers to predict the effects of code transformations on performance and guide search for optimal transformations. Building state-of-the-art performance models with deep learning, however, requires vast labeled datasets of random programs -- an expensive and time-consuming process, stretching over months. This paper introduces a self-supervised pre-training scheme with autoencoders to reduce the need for labeled data. By pre-training on a large dataset of random programs, the autoencoder learns representations of code and transformations, which are then used to embed programs for the performance model. Implemented in the Tiramisu autoscheduler, our approach improves model accuracy with less data. For example, to achieve a MAPE of 20.72%, the original model requires 18 million data points, whereas our method achieves a similar MAPE of 22.44% with only 3.6 million data points, reducing data requirements by 5x.


Decoupled SGDA for Games with Intermittent Strategy Communication

arXiv.org Artificial Intelligence

We focus on reducing communication overhead in multiplayer games, where frequently exchanging strategies between players is not feasible and players have noisy or outdated strategies of the other players. We introduce Decoupled SGDA, a novel adaptation of Stochastic Gradient Descent Ascent (SGDA). In this approach, players independently update their strategies based on outdated opponent strategies, with periodic synchronization to align strategies. For Strongly-Convex-Strongly-Concave (SCSC) games, we demonstrate that Decoupled SGDA achieves near-optimal communication complexity comparable to the best-known GDA rates. For weakly coupled games where the interaction between players is lower relative to the non-interactive part of the game, Decoupled SGDA significantly reduces communication costs compared to standard SGDA. Our findings extend to multi-player games. To provide insights into the effect of communication frequency and convergence, we extensively study the convergence of Decoupled SGDA for quadratic minimax problems. Lastly, in settings where the noise over the players is imbalanced, Decoupled SGDA significantly outperforms federated minimax methods.


Causal Discovery via Bayesian Optimization

arXiv.org Machine Learning

Existing score-based methods for directed acyclic graph (DAG) learning from observational data struggle to recover the causal graph accurately and sample-efficiently. To overcome this, in this study, we propose DrBO (DAG recovery via Bayesian Optimization)-a novel DAG learning framework leveraging Bayesian optimization (BO) to find high-scoring DAGs. We show that, by sophisticatedly choosing the promising DAGs to explore, we can find higher-scoring ones much more efficiently. To address the scalability issues of conventional BO in DAG learning, we replace Gaussian Processes commonly employed in BO with dropout neural networks, trained in a continual manner, which allows for (i) flexibly modeling the DAG scores without overfitting, (ii) incorporation of uncertainty into the estimated scores, and (iii) scaling with the number of evaluations. As a result, DrBO is computationally efficient and can find the accurate DAG in fewer trials and less time than existing state-of-the-art methods. This is demonstrated through an extensive set of empirical evaluations on many challenging settings with both synthetic and real data. Our implementation is available at https://github.com/baosws/DrBO.