Goto

Collaborating Authors

 Fern, Xiaoli


Neural Networks Learn Statistics of Increasing Complexity

arXiv.org Artificial Intelligence

The distributional simplicity bias (DSB) posits that neural networks learn low-order moments of the data distribution first, before moving on to higher-order correlations. In this work, we present compelling new evidence for the DSB by showing that networks automatically learn to perform well on maximum-entropy distributions whose low-order statistics match those of the training set early in training, then lose this ability later. We also extend the DSB to discrete domains by proving an equivalence between token $n$-gram frequencies and the moments of embedding vectors, and by finding empirical evidence for the bias in LLMs. Finally we use optimal transport methods to surgically edit the low-order statistics of one class to match those of another, and show that early-training networks treat the edited samples as if they were drawn from the target class. Code is available at https://github.com/EleutherAI/features-across-time.


Bayesian Optimization with Resource Constraints and Production

AAAI Conferences

In this paper, we aim to take a step toward a tighter integration of automated planning and Bayesian Optimization (BO). BO is an approach for optimizing costly-to-evaluate functions by selecting a limited number of experiments that each evaluate the function at a specified input. Typical BO formulations assume that experiments are selected one at a time, or in fixed batches, and that experiments can be executed immediately upon request. This setup fails to capture many real-world domains where the execution of an experiment requires setup and preparation time. In this paper, we define a novel BO problem formulation that models the resources and activities needed to prepare and run experiments. We then present a planning approach, based on finite-horizon tree search, for scheduling the potentially concurrent experimental activities with the aim of best optimizing the function within a limited time horizon. A key element of the approach is a novel state evaluation function for evaluating leaves of the search tree, for which we prove approximate guarantees. We evaluate the approach on a number of diverse benchmark problems and show that it produces high-quality results compared to a number of natural baselines.


Toward Embedding Bayesian Optimization in the Lab: Reasoning about Resource and Actions

AAAI Conferences

A key contribution of this paper is to introduce an extended BO setting, called Bayesian Optimization with Resources We consider optimizing an unknown function f by running (BOR), that explicitly models experimental resources experiments that each take an input x and return a noisy output and activities. In particular, our model specifies f(x). In particular, we focus on the setting where experiments the following: 1) resource requirements for experiments, are expensive, limiting the number of experiments which may vary across different experiments, 2) resourceproduction that can be run. Bayesian Optimization (BO) addresses this actions, which produce the various resources and setting by maintaining a Bayesian posterior over f to capture can require varying amounts of time, and 3) a set of "labs" our uncertainty about f given prior experiments (Jones for running concurrent experiments and a set of "production 2001; Brochu, Cora, and de Freitas 2010). The posterior is lines" for concurrent resource production. The problem is then used to select new experiments that trades-off exploring then to select and schedule the experiments and resourceproduction uncertain areas of the experimental space and exploiting actions in order to optimize the unknown objective promising areas.


Learning Greedy Policies for the Easy-First Framework

AAAI Conferences

Easy-first, a search-based structured prediction approach, has been applied to many NLP tasks including dependency parsing and coreference resolution. This approach employs a learned greedy policy (action scoring function) to make easy decisions first, which constrains the remaining decisions and makes them easier. We formulate greedy policy learning in the Easy-first approach as a novel non-convex optimization problem and solve it via an efficient Majorization Minimizatoin (MM) algorithm. Results on within-document coreference and cross-document joint entity and event coreference tasks demonstrate that the proposed approach achieves statistically significant performance improvement over existing training regimes for Easy-first and is less susceptible to overfitting.


Learning Scripts as Hidden Markov Models

AAAI Conferences

Scripts have been proposed to model the stereotypical event sequences found in narratives. They can be applied to make a variety of inferences including fillinggaps in the narratives and resolving ambiguous references. This paper proposes the first formal frameworkfor scripts based on Hidden Markov Models (HMMs). Our framework supports robust inference and learning algorithms, which are lacking in previous clustering models. We develop an algorithm for structure andparameter learning based on Expectation Maximizationand evaluate it on a number of natural datasets. The results show that our algorithm is superior to several informed baselines for predicting missing events in partialobservation sequences.


A Lipschitz Exploration-Exploitation Scheme for Bayesian Optimization

arXiv.org Machine Learning

The problem of optimizing unknown costly-to-evaluate functions has been studied for a long time in the context of Bayesian Optimization. Algorithms in this field aim to find the optimizer of the function by asking only a few function evaluations at locations carefully selected based on a posterior model. In this paper, we assume the unknown function is Lipschitz continuous. Leveraging the Lipschitz property, we propose an algorithm with a distinct exploration phase followed by an exploitation phase. The exploration phase aims to select samples that shrink the search space as much as possible. The exploitation phase then focuses on the reduced search space and selects samples closest to the optimizer. Considering the Expected Improvement (EI) as a baseline, we empirically show that the proposed algorithm significantly outperforms EI.


Hybrid Batch Bayesian Optimization

arXiv.org Artificial Intelligence

Bayesian Optimization aims at optimizing an unknown non-convex/concave function that is costly to evaluate. We are interested in application scenarios where concurrent function evaluations are possible. Under such a setting, BO could choose to either sequentially evaluate the function, one input at a time and wait for the output of the function before making the next selection, or evaluate the function at a batch of multiple inputs at once. These two different settings are commonly referred to as the sequential and batch settings of Bayesian Optimization. In general, the sequential setting leads to better optimization performance as each function evaluation is selected with more information, whereas the batch setting has an advantage in terms of the total experimental time (the number of iterations). In this work, our goal is to combine the strength of both settings. Specifically, we systematically analyze Bayesian optimization using Gaussian process as the posterior estimator and provide a hybrid algorithm that, based on the current state, dynamically switches between a sequential policy and a batch policy with variable batch sizes. We provide theoretical justification for our algorithm and present experimental results on eight benchmark BO problems. The results show that our method achieves substantial speedup (up to %78) compared to a pure sequential policy, without suffering any significant performance loss.


Myopic Policies for Budgeted Optimization with Constrained Experiments

AAAI Conferences

Motivated by a real-world problem, we study a novel budgeted optimization problem where the goal is to optimize an unknown function f ( x ) given a budget. In our setting, it is not practical to request samples of  f ( x ) at precise input values due to the formidable cost of precise experimental setup. Rather, we may request a constrained experiment, which is a subset r of the input space for which the experimenter returns  x  in r and  f ( x ). Importantly, as the constraints become looser, the experimental cost decreases, but the uncertainty about the location  x  of the next observation increases. Our goal is to manage this trade-off by selecting a sequence of constrained experiments to best optimize f within the budget. We introduce cost-sensitive policies for selecting constrained experiments using both model-free and model-based approaches, inspired by policies for unconstrained settings. Experiments on synthetic functions and functions derived from real-world experimental data indicate that our policies outperform random selection, that the model-based policies are superior to model-free ones, and give insights into which policies are preferable overall.