Bastani, Osbert
Large Scale Multi-Task Bayesian Optimization with Large Language Models
Zeng, Yimeng, Maus, Natalie, Jones, Haydn Thomas, Tao, Jeffrey, Wan, Fangping, Torres, Marcelo Der Torossian, de la Fuente-Nunez, Cesar, Marcus, Ryan, Bastani, Osbert, Gardner, Jacob R.
In multi-task Bayesian optimization, the goal is to leverage experience from optimizing existing tasks to improve the efficiency of optimizing new ones. While approaches using multi-task Gaussian processes or deep kernel transfer exist, the performance improvement is marginal when scaling to more than a moderate number of tasks. We introduce a novel approach leveraging large language models (LLMs) to learn from, and improve upon, previous optimization trajectories, scaling to approximately 2000 distinct tasks. Specifically, we propose an iterative framework in which an LLM is fine-tuned using the high quality solutions produced by BayesOpt to generate improved initializations that accelerate convergence for future optimization tasks based on previous search trajectories. We evaluate our method on two distinct domains: database query optimization and antimicrobial peptide design. Results demonstrate that our approach creates a positive feedback loop, where the LLM's generated initializations gradually improve, leading to better optimization performance. As this feedback loop continues, we find that the LLM is eventually able to generate solutions to new tasks in just a few shots that are better than the solutions produced by "from scratch" by Bayesian optimization while simultaneously requiring significantly fewer oracle calls.
LLM Program Optimization via Retrieval Augmented Search
Anupam, Sagnik, Shypula, Alexander, Bastani, Osbert
With the advent of large language models (LLMs), there has been a great deal of interest in applying them to solve difficult programming tasks. Recent work has demonstrated their potential at program optimization, a key challenge in programming languages research. We propose a blackbox adaptation method called Retrieval Augmented Search (RAS) that performs beam search over candidate optimizations; at each step, it retrieves in-context examples from a given training dataset of slow-fast program pairs to guide the LLM. Critically, we find that performing contextual retrieval based on an LLM-generated natural language description significantly outperforms retrieval based on the source code. In addition, we propose a method called AEGIS for improving interpretability by decomposing training examples into "atomic edits" that are significantly more incremental in nature. We show that RAS performs 1.8$\times$ better than prior state-of-the-art blackbox adaptation strategies, and that AEGIS performs 1.37$\times$ better while performing significantly smaller edits.
Diversity By Design: Leveraging Distribution Matching for Offline Model-Based Optimization
Yao, Michael S., Gee, James C., Bastani, Osbert
The goal of offline model-based optimization (MBO) is to propose new designs that maximize a reward function given only an offline dataset. However, an important desiderata is to also propose a diverse set of final candidates that capture many optimal and near-optimal design configurations. We propose Diversity in Adversarial Model-based Optimization (DynAMO) as a novel method to introduce design diversity as an explicit objective into any MBO problem. Our key insight is to formulate diversity as a distribution matching problem where the distribution of generated designs captures the inherent diversity contained within the offline dataset. Extensive experiments spanning multiple scientific domains show that DynAMO can be used with common optimization methods to significantly improve the diversity of proposed designs while still discovering high-quality candidates.
Vision Language Models are In-Context Value Learners
Ma, Yecheng Jason, Hejna, Joey, Wahid, Ayzaan, Fu, Chuyuan, Shah, Dhruv, Liang, Jacky, Xu, Zhuo, Kirmani, Sean, Xu, Peng, Driess, Danny, Xiao, Ted, Tompson, Jonathan, Bastani, Osbert, Jayaraman, Dinesh, Yu, Wenhao, Zhang, Tingnan, Sadigh, Dorsa, Xia, Fei
Predicting temporal progress from visual trajectories is important for intelligent robots that can learn, adapt, and improve. However, learning such progress estimator, or temporal value function, across different tasks and domains requires both a large amount of diverse data and methods which can scale and generalize. To address these challenges, we present Generative Value Learning (\GVL), a universal value function estimator that leverages the world knowledge embedded in vision-language models (VLMs) to predict task progress. Naively asking a VLM to predict values for a video sequence performs poorly due to the strong temporal correlation between successive frames. Instead, GVL poses value estimation as a temporal ordering problem over shuffled video frames; this seemingly more challenging task encourages VLMs to more fully exploit their underlying semantic and temporal grounding capabilities to differentiate frames based on their perceived task progress, consequently producing significantly better value predictions. Without any robot or task specific training, GVL can in-context zero-shot and few-shot predict effective values for more than 300 distinct real-world tasks across diverse robot platforms, including challenging bimanual manipulation tasks. Furthermore, we demonstrate that GVL permits flexible multi-modal in-context learning via examples from heterogeneous tasks and embodiments, such as human videos. The generality of GVL enables various downstream applications pertinent to visuomotor policy learning, including dataset filtering, success detection, and advantage-weighted regression -- all without any model training or finetuning.
Eurekaverse: Environment Curriculum Generation via Large Language Models
Liang, William, Wang, Sam, Wang, Hung-Ju, Bastani, Osbert, Jayaraman, Dinesh, Ma, Yecheng Jason
Recent work has demonstrated that a promising strategy for teaching robots a wide range of complex skills is by training them on a curriculum of progressively more challenging environments. However, developing an effective curriculum of environment distributions currently requires significant expertise, which must be repeated for every new domain. Our key insight is that environments are often naturally represented as code. Thus, we probe whether effective environment curriculum design can be achieved and automated via code generation by large language models (LLM). In this paper, we introduce Eurekaverse, an unsupervised environment design algorithm that uses LLMs to sample progressively more challenging, diverse, and learnable environments for skill training. We validate Eurekaverse's effectiveness in the domain of quadrupedal parkour learning, in which a quadruped robot must traverse through a variety of obstacle courses. The automatic curriculum designed by Eurekaverse enables gradual learning of complex parkour skills in simulation and can successfully transfer to the real-world, outperforming manual training courses designed by humans.
Conformal Structured Prediction
Zhang, Botong, Li, Shuo, Bastani, Osbert
Conformal prediction has recently emerged as a promising strategy for quantifying the uncertainty of a predictive model; these algorithms modify the model to output sets of labels that are guaranteed to contain the true label with high probability. However, existing conformal prediction algorithms have largely targeted classification and regression settings, where the structure of the prediction set has a simple form as a level set of the scoring function. However, for complex structured outputs such as text generation, these prediction sets might include a large number of labels and therefore be hard for users to interpret. In this paper, we propose a general framework for conformal prediction in the structured prediction setting, that modifies existing conformal prediction algorithms to output structured prediction sets that implicitly represent sets of labels. In addition, we demonstrate how our approach can be applied in domains where the prediction sets can be represented as a set of nodes in a directed acyclic graph; for instance, for hierarchical labels such as image classification, a prediction set might be a small subset of coarse labels implicitly representing the prediction set of all their more fine-descendants. We demonstrate how our algorithm can be used to construct prediction sets that satisfy a desired coverage guarantee in several domains.
Evidence Is All You Need: Ordering Imaging Studies via Language Model Alignment with the ACR Appropriateness Criteria
Yao, Michael S., Chae, Allison, Kahn, Charles E. Jr., Witschey, Walter R., Gee, James C., Sagreiya, Hersh, Bastani, Osbert
Diagnostic imaging studies are an increasingly important component of the workup and management of acutely presenting patients. However, ordering appropriate imaging studies according to evidence-based medical guidelines is a challenging task with a high degree of variability between healthcare providers. To address this issue, recent work has investigated if generative AI and large language models can be leveraged to help clinicians order relevant imaging studies for patients. However, it is challenging to ensure that these tools are correctly aligned with medical guidelines, such as the American College of Radiology's Appropriateness Criteria (ACR AC). In this study, we introduce a framework to intelligently leverage language models by recommending imaging studies for patient cases that are aligned with evidence-based guidelines. We make available a novel dataset of patient "one-liner" scenarios to power our experiments, and optimize state-of-the-art language models to achieve an accuracy on par with clinicians in image ordering. Finally, we demonstrate that our language model-based pipeline can be used as intelligent assistants by clinicians to support image ordering workflows and improve the accuracy of imaging study ordering according to the ACR AC. Our work demonstrates and validates a strategy to leverage AI-based software to improve trustworthy clinical decision making in alignment with expert evidence-based guidelines.
Parameter Estimation for Generalized Low-Rank Matrix Sensing by Learning on Riemannian Manifolds
Bastani, Osbert
We prove convergence guarantees for generalized low-rank matrix sensing -- i.e., where matrix sensing where the observations may be passed through some nonlinear link function. We focus on local convergence of the optimal estimator, ignoring questions of optimization. In particular, assuming the minimizer of the empirical loss $\theta^0$ is in a constant size ball around the true parameters $\theta^*$, we prove that $d(\theta^0,\theta^*)=\tilde{O}(\sqrt{dk^2/n})$. Our analysis relies on tools from Riemannian geometry to handle the rotational symmetry in the parameter space.
Zeroth-Order Fine-Tuning of LLMs with Extreme Sparsity
Guo, Wentao, Long, Jikai, Zeng, Yimeng, Liu, Zirui, Yang, Xinyu, Ran, Yide, Gardner, Jacob R., Bastani, Osbert, De Sa, Christopher, Yu, Xiaodong, Chen, Beidi, Xu, Zhaozhuo
Zeroth-order optimization (ZO) is a memory-efficient strategy for fine-tuning Large Language Models using only forward passes. However, the application of ZO fine-tuning in memory-constrained settings such as mobile phones and laptops is still challenging since full precision forward passes are infeasible. In this study, we address this limitation by integrating sparsity and quantization into ZO fine-tuning of LLMs. Specifically, we investigate the feasibility of fine-tuning an extremely small subset of LLM parameters using ZO. This approach allows the majority of un-tuned parameters to be quantized to accommodate the constraint of limited device memory. Our findings reveal that the pre-training process can identify a set of "sensitive parameters" that can guide the ZO fine-tuning of LLMs on downstream tasks. Our results demonstrate that fine-tuning 0.1% sensitive parameters in the LLM with ZO can outperform the full ZO fine-tuning performance, while offering wall-clock time speedup. Additionally, we show that ZO fine-tuning targeting these 0.1% sensitive parameters, combined with 4 bit quantization, enables efficient ZO fine-tuning of an Llama2-7B model on a GPU device with less than 8 GiB of memory and notably reduced latency.
DrEureka: Language Model Guided Sim-To-Real Transfer
Ma, Yecheng Jason, Liang, William, Wang, Hung-Ju, Wang, Sam, Zhu, Yuke, Fan, Linxi, Bastani, Osbert, Jayaraman, Dinesh
Transferring policies learned in simulation to the real world is a promising strategy for acquiring robot skills at scale. However, sim-to-real approaches typically rely on manual design and tuning of the task reward function as well as the simulation physics parameters, rendering the process slow and human-labor intensive. In this paper, we investigate using Large Language Models (LLMs) to automate and accelerate sim-to-real design. Our LLM-guided sim-to-real approach, DrEureka, requires only the physics simulation for the target task and automatically constructs suitable reward functions and domain randomization distributions to support real-world transfer. We first demonstrate that our approach can discover sim-to-real configurations that are competitive with existing human-designed ones on quadruped locomotion and dexterous manipulation tasks. Then, we showcase that our approach is capable of solving novel robot tasks, such as quadruped balancing and walking atop a yoga ball, without iterative manual design.