bayesopt
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Reviewer # 1 Thanks for the comments!
We should clarify that the theoretical results already consider out-of-sample generalization. There are connections between this work and entropy regularized RL, but there are also distinctions. This allows us to prove new generalization bounds in the form of Theorem 8. We are also able to We also have the same suite of results prepared for MNIST, and standard deviations for Table 1. "It is unclear to me if the reward estimation algorithm is actually evaluated in the experiments." Y es, Section 3.6 used "Can you comment on the increased variance demonstrated by Composite on T able 2?" To produce Table 2, "I find curious that [...] all the experiments consists of classification tasks "reworked" [...]." Criteo dataset is a benchmark in this area, which has been extracted from a real online advertising challenge.
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Energy > Oil & Gas (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
- Europe > Switzerland > Zürich > Zürich (0.14)
- Europe > United Kingdom > England (0.14)
- North America > United States (0.14)
- Health & Medicine > Pharmaceuticals & Biotechnology (0.68)
- Energy > Oil & Gas (0.46)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.50)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.45)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.45)
AgentForge: A Flexible Low-Code Platform for Reinforcement Learning Agent Design
Junior, Francisco Erivaldo Fernandes, Oulasvirta, Antti
Developing a reinforcement learning (RL) agent often involves identifying values for numerous parameters, covering the policy, reward function, environment, and agent-internal architecture. Since these parameters are interrelated in complex ways, optimizing them is a black-box problem that proves especially challenging for nonexperts. Although existing optimization-as-a-service platforms (e.g., Vizier and Optuna) can handle such problems, they are impractical for RL systems, since the need for manual user mapping of each parameter to distinct components makes the effort cumbersome. It also requires understanding of the optimization process, limiting the systems' application beyond the machine learning field and restricting access in areas such as cognitive science, which models human decision-making. To tackle these challenges, the paper presents AgentForge, a flexible low-code platform to optimize any parameter set across an RL system. Available at https://github.com/feferna/AgentForge, it allows an optimization problem to be defined in a few lines of code and handed to any of the interfaced optimizers. With AgentForge, the user can optimize the parameters either individually or jointly. The paper presents an evaluation of its performance for a challenging vision-based RL problem.
- Europe > Finland (0.05)
- South America > Ecuador > Pichincha Province > Quito (0.04)
- North America > United States (0.04)
- (2 more...)
A Bayesian Optimization Approach to Machine Translation Reranking
Cheng, Julius, Züfle, Maike, Zouhar, Vilém, Vlachos, Andreas
Reranking a list of candidates from a machine translation system with an external scoring model and returning the highest-scoring candidate remains a simple and effective method for improving the overall output quality. Translation scoring models continue to grow in size, with the best models being comparable to generation models. Thus, reranking can add substantial computational cost to the translation pipeline. In this work, we pose reranking as a Bayesian optimization (BayesOpt) problem. By strategically selecting candidates to score based on a balance of exploration and exploitation, we show that it is possible to find top-scoring candidates when scoring only a fraction of the candidate list. For instance, our method achieves the same CometKiwi score using only 70 scoring evaluations compared a baseline system using 180. We present a multi-fidelity setting for BayesOpt, where the candidates are first scored with a cheaper but noisier proxy scoring model, which further improves the cost-performance tradeoff when using smaller but well-trained distilled proxy scorers.
Learning Dynamic Tasks on a Large-scale Soft Robot in a Handful of Trials
Zwane, Sicelukwanda, Cheney, Daniel, Johnson, Curtis C., Luo, Yicheng, Bekiroglu, Yasemin, Killpack, Marc D., Deisenroth, Marc Peter
Soft robots offer more flexibility, compliance, and adaptability than traditional rigid robots. They are also typically lighter and cheaper to manufacture. However, their use in real-world applications is limited due to modeling challenges and difficulties in integrating effective proprioceptive sensors. Large-scale soft robots ($\approx$ two meters in length) have greater modeling complexity due to increased inertia and related effects of gravity. Common efforts to ease these modeling difficulties such as assuming simple kinematic and dynamics models also limit the general capabilities of soft robots and are not applicable in tasks requiring fast, dynamic motion like throwing and hammering. To overcome these challenges, we propose a data-efficient Bayesian optimization-based approach for learning control policies for dynamic tasks on a large-scale soft robot. Our approach optimizes the task objective function directly from commanded pressures, without requiring approximate kinematics or dynamics as an intermediate step. We demonstrate the effectiveness of our approach through both simulated and real-world experiments.
- North America > United States > Utah (0.14)
- Europe > Sweden (0.14)
- North America > United States > New York > New York County > New York City (0.04)
- North America > United States > Massachusetts > Hampshire County > Amherst (0.04)
- North America > United States > California > Los Angeles County > Long Beach (0.04)
- Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Bayesian optimization for stable properties amid processing fluctuations in sputter deposition
Shrivastava, Ankit, Kalaswad, Matias, Custer, Joyce O., Adams, David P., Najm, Habib N.
We introduce a Bayesian optimization approach to guide the sputter deposition of molybdenum thin films, aiming to achieve desired residual stress and sheet resistance while minimizing susceptibility to stochastic fluctuations during deposition. Thin films are pivotal in numerous technologies, including semiconductors and optical devices, where their properties are critical. Sputter deposition parameters, such as deposition power, vacuum chamber pressure, and working distance, influence physical properties like residual stress and resistance. Excessive stress and high resistance can impair device performance, necessitating the selection of optimal process parameters. Furthermore, these parameters should ensure the consistency and reliability of thin film properties, assisting in the reproducibility of the devices. However, exploring the multidimensional design space for process optimization is expensive. Bayesian optimization is ideal for optimizing inputs/parameters of general black-box functions without reliance on gradient information. We utilize Bayesian optimization to optimize deposition power and pressure using a custom-built objective function incorporating observed stress and resistance data. Additionally, we integrate prior knowledge of stress variation with pressure into the objective function to prioritize films least affected by stochastic variations. Our findings demonstrate that Bayesian optimization effectively explores the design space and identifies optimal parameter combinations meeting desired stress and resistance specifications.
- North America > United States (0.68)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
- Energy (1.00)
- Government > Regional Government > North America Government > United States Government (0.46)
Quality-Weighted Vendi Scores And Their Application To Diverse Experimental Design
Nguyen, Quan, Dieng, Adji Bousso
Experimental design techniques such as active search and Bayesian optimization are widely used in the natural sciences for data collection and discovery. However, existing techniques tend to favor exploitation over exploration of the search space, which causes them to get stuck in local optima. This ``collapse" problem prevents experimental design algorithms from yielding diverse high-quality data. In this paper, we extend the Vendi scores -- a family of interpretable similarity-based diversity metrics -- to account for quality. We then leverage these quality-weighted Vendi scores to tackle experimental design problems across various applications, including drug discovery, materials discovery, and reinforcement learning. We found that quality-weighted Vendi scores allow us to construct policies for experimental design that flexibly balance quality and diversity, and ultimately assemble rich and diverse sets of high-performing data points. Our algorithms led to a 70%-170% increase in the number of effective discoveries compared to baselines.