Goto

Collaborating Authors

 Shameli, Ali


Learning Compiler Pass Orders using Coreset and Normalized Value Prediction

arXiv.org Artificial Intelligence

Finding the optimal pass sequence of compilation can lead to a significant reduction in program size and/or improvement in program efficiency. Prior works on compilation pass ordering have two major drawbacks. They either require an excessive budget (in terms of compilation steps) at compile time or fail to generalize to unseen programs. In this paper, for code-size reduction tasks, we propose a novel pipeline to find program-dependent pass sequences within 45 compilation calls. It first identifies a coreset of 50 pass sequences via greedy optimization of a submodular function, and then learns a policy with Graph Neural Network (GNN) to pick the optimal sequence by predicting the normalized values of the pass sequences in the coreset. Despite its simplicity, our pipeline outperforms the default -Oz flag by an average of 4.7% over a large collection (4683) of unseen code repositories from diverse domains across 14 datasets. In comparison, previous approaches like reinforcement learning on the raw pass sequence space may take days to train due to sparse reward, and may not generalize well in held-out ones from different domains. Our results demonstrate that existing human-designed compiler flags can be improved with a simple yet effective technique that transforms the raw action space into a small one with denser rewards.


Sobolev Norm Learning Rates for Conditional Mean Embeddings

arXiv.org Machine Learning

In the past decade, several studies have explored a new framework for embedding conditional distributions in reproducing kernel Hilbert spaces (RKHS). This approach seeks to represent a conditional distribution as an RKHS element, and thereby reduce the computation of conditional expectations to the evaluation of kernel inner products. Unlike other distribution learning approaches, which often involve density estimation and expensive numerical analysis, the conditional mean embedding (CME) framework exploits the popular kernel trick to allow distributions to be learned directly and efficiently from sample information, and do not require the target distribution to possess a density function. The broad generalizability and computational levity of conditional embeddings have led them to find many applications in reinforcement learning, hypothesis testing, and nonparametric inference [3-5, 17], where conditional relationships are often of pertinent interest. A central issue involved in the conditional embedding framework is the performance of the sample estimator. Despite their successful application, there has been a limited study of optimal learning rates for conditional mean embeddings. Several foundational works [17, 18] established the consistency of the sample embedding estimator, exploring its convergence rate to a "true" embedding in the RKHS norm. These works framed the act of conditioning as a linear operator between two Hilbert spaces, which mapped features of the independent variable in the input space to the mean embeddings of their respective conditional distributions in the output feature space.


Sample Efficient Graph-Based Optimization with Noisy Observations

arXiv.org Machine Learning

We study sample complexity of optimizing "hill-climbing friendly" functions defined on a graph under noisy observations. We define a notion of convexity, and we show that a variant of best-arm identification can find a near-optimal solution after a small number of queries that is independent of the size of the graph. For functions that have local minima and are nearly convex, we show a sample complexity for the classical simulated annealing under noisy observations. We show effectiveness of the greedy algorithm with restarts and the simulated annealing on problems of graph-based nearest neighbor classification as well as a web document re-ranking application.