Marcus, Ryan
Large Scale Multi-Task Bayesian Optimization with Large Language Models
Zeng, Yimeng, Maus, Natalie, Jones, Haydn Thomas, Tao, Jeffrey, Wan, Fangping, Torres, Marcelo Der Torossian, de la Fuente-Nunez, Cesar, Marcus, Ryan, Bastani, Osbert, Gardner, Jacob R.
In multi-task Bayesian optimization, the goal is to leverage experience from optimizing existing tasks to improve the efficiency of optimizing new ones. While approaches using multi-task Gaussian processes or deep kernel transfer exist, the performance improvement is marginal when scaling to more than a moderate number of tasks. We introduce a novel approach leveraging large language models (LLMs) to learn from, and improve upon, previous optimization trajectories, scaling to approximately 2000 distinct tasks. Specifically, we propose an iterative framework in which an LLM is fine-tuned using the high quality solutions produced by BayesOpt to generate improved initializations that accelerate convergence for future optimization tasks based on previous search trajectories. We evaluate our method on two distinct domains: database query optimization and antimicrobial peptide design. Results demonstrate that our approach creates a positive feedback loop, where the LLM's generated initializations gradually improve, leading to better optimization performance. As this feedback loop continues, we find that the LLM is eventually able to generate solutions to new tasks in just a few shots that are better than the solutions produced by "from scratch" by Bayesian optimization while simultaneously requiring significantly fewer oracle calls.
The Unreasonable Effectiveness of LLMs for Query Optimization
Akioyamen, Peter, Yi, Zixuan, Marcus, Ryan
Recent work in database query optimization has used complex machine learning strategies, such as customized reinforcement learning schemes. Surprisingly, we show that LLM embeddings of query text contain useful semantic information for query optimization. Specifically, we show that a simple binary classifier deciding between alternative query plans, trained only on a small number of labeled embedded query vectors, can outperform existing heuristic systems. Although we only present some preliminary results, an LLM-powered query optimizer could provide significant benefits, both in terms of performance and simplicity.
Kepler: Robust Learning for Faster Parametric Query Optimization
Doshi, Lyric, Zhuang, Vincent, Jain, Gaurav, Marcus, Ryan, Huang, Haoyu, Altinbรผken, Deniz, Brevdo, Eugene, Fraser, Campbell
Most existing parametric query optimization (PQO) techniques rely on traditional query optimizer cost models, which are often inaccurate and result in suboptimal query performance. We propose Kepler, an end-to-end learning-based approach to PQO that demonstrates significant speedups in query latency over a traditional query optimizer. Central to our method is Row Count Evolution (RCE), a novel plan generation algorithm based on perturbations in the sub-plan cardinality space. While previous approaches require accurate cost models, we bypass this requirement by evaluating candidate plans via actual execution data and training an ML model to predict the fastest plan given parameter binding values. Our models leverage recent advances in neural network uncertainty in order to robustly predict faster plans while avoiding regressions in query performance. Experimentally, we show that Kepler achieves significant improvements in query runtime on multiple datasets on PostgreSQL.
Class-Weighted Evaluation Metrics for Imbalanced Data Classification
Gupta, Akhilesh, Tatbul, Nesime, Marcus, Ryan, Zhou, Shengtian, Lee, Insup, Gottschlich, Justin
Class distribution skews in imbalanced datasets may lead to models with prediction bias towards majority classes, making fair assessment of classifiers a challenging task. Balanced Accuracy is a popular metric used to evaluate a classifier's prediction performance under such scenarios. However, this metric falls short when classes vary in importance, especially when class importance is skewed differently from class cardinality distributions. In this paper, we propose a simple and general-purpose evaluation framework for imbalanced data classification that is sensitive to arbitrary skews in class cardinalities and importances. Experiments with several state-of-the-art classifiers tested on real-world datasets and benchmarks from two different domains show that our new framework is more effective than Balanced Accuracy - not only in evaluating and ranking model predictions, but also in training the models themselves. For a broad range of machine learning (ML) tasks, predictive modeling in the presence of imbalanced datasets - those with severe distribution skews - has been a longstanding problem (He & Garcia, 2009; Sun et al., 2009; He & Ma, 2013; Branco et al., 2016; Hilario et al., 2018; Johnson & Khoshgoftaar, 2019). Imbalanced training datasets lead to models with prediction bias towards majority classes, which in turn results in misclassification of the underrepresented ones.
MISIM: A Novel Code Similarity System
Ye, Fangke, Zhou, Shengtian, Venkat, Anand, Marcus, Ryan, Tatbul, Nesime, Tithi, Jesmin Jahan, Hasabnis, Niranjan, Petersen, Paul, Mattson, Timothy, Kraska, Tim, Dubey, Pradeep, Sarkar, Vivek, Gottschlich, Justin
Code similarity systems are integral to a range of applications from code recommendation to automated software defect correction. We argue that code similarity is now a first-order problem that must be solved. To begin to address this, we present machine Inferred Code Similarity (MISIM), a novel end-to-end code similarity system that consists of two core components. First, MISIM uses a novel context-aware semantic structure, which is designed to aid in lifting semantic meaning from code syntax. Second, MISIM provides a neural-based code similarity scoring algorithm, which can be implemented with various neural network architectures with learned parameters. We compare MISIM to three state-of-the-art code similarity systems: (i)code2vec, (ii)Neural Code Comprehension, and (iii)Aroma. In our experimental evaluation across 328,155 programs (over 18 million lines of code), MISIM has 1.5x to 43.4x better accuracy than all three systems.