AITopics | sequential policy

High-dimensional action spaces remain a challenge for dynamic algorithm configuration (DAC). Interdependencies and varying importance between action dimensions are further known key characteristics of DAC problems. We argue that these Coupled Action Dimensions with Importance Differences (CANDID) represent aspects of the DAC problem that are not yet fully explored. To address this gap, we introduce a new white-box benchmark within the DACBench suite that simulates the properties of CANDID. Further, we propose sequential policies as an effective strategy for managing these properties. Such policies factorize the action space and mitigate exponential growth by learning a policy per action dimension. At the same time, these policies accommodate the interdependence of action dimensions by fostering implicit coordination. We show this in an experimental study of value-based policies on our new benchmark. This study demonstrates that sequential policies significantly outperform independent learning of factorized policies in CANDID action spaces. In addition, they overcome the scalability limitations associated with learning a single policy across all action dimensions. The code used for our experiments is available under https://github.com/PhilippBordne/candidDAC.

action dimension, action space, benchmark, (13 more...)

arXiv.org Artificial Intelligence

2407.05789

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Batch Bayesian Optimization via Simulation Matching

Neural Information Processing SystemsApr-6-2023, 13:41:25 GMT

Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. There are a number of effective sequential policies for selecting the individual inputs. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible.

batch bayesian optimization, sequential policy, simulation matching, (2 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (0.43)

Add feedback

Treatment recommendation with distributional targets

Kock, Anders Bredahl, Preinerstorfer, David, Veliyev, Bezirgen

arXiv.org Machine LearningMay-27-2020

We study the problem of a decision maker who must provide the best possible treatment recommendation based on an experiment. The desirability of the outcome distribution resulting from the policy recommendation is measured through a functional capturing the distributional characteristic that the decision maker is interested in optimizing. This could be, e.g., its inherent inequality, welfare, level of poverty or its distance to a desired outcome distribution. If the functional of interest is not quasi-convex or if there are constraints, the optimal recommendation may be a mixture of treatments. This vastly expands the set of recommendations that must be considered. We characterize the difficulty of the problem by obtaining maximal expected regret lower bounds. Furthermore, we propose two regret-optimal policies. The first policy is static and thus applicable irrespectively of the subjects arriving sequentially or not in the course of the experimental phase. The second policy can utilize that subjects arrive sequentially by successively eliminating inferior treatments and thus spends the sampling effort where it is most needed.

artificial intelligence, assumption 2, machine learning, (18 more...)

arXiv.org Machine Learning

2005.09717

Country:

North America > United States > New York (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning (0.46)

Add feedback

Batch Bayesian Optimization via Simulation Matching

Azimi, Javad, Fern, Alan, Fern, Xiaoli Z.

Neural Information Processing SystemsFeb-15-2020, 00:41:25 GMT

Bayesian optimization methods are often used to optimize unknown functions that are costly to evaluate. Typically, these methods sequentially select inputs to be evaluated one at a time based on a posterior over the unknown function that is updated after each evaluation. There are a number of effective sequential policies for selecting the individual inputs. In many applications, however, it is desirable to perform multiple evaluations in parallel, which requires selecting batches of multiple inputs to evaluate at once. In this paper, we propose a novel approach to batch Bayesian optimization, providing a policy for selecting batches of inputs with the goal of optimizing the function as efficiently as possible.

batch bayesian optimization, sequential policy, simulation matching, (2 more...)

Neural Information Processing Systems

Genre: Research Report (0.43)

Technology: Information Technology > Artificial Intelligence (0.48)

Add feedback

Efficient Object Detection in Large Images using Deep Reinforcement Learning

#artificialintelligenceDec-15-2019, 03:06:53 GMT

Reinforcement Learning for Efficient Detection Reinforcement Learning (RL) has been recently used to (1) replace classical detectors such as SSD and Faster-RCNN, (2) replace exhaustive box proposal techniques in two-stage detectors, and (3) find ROIs in very large images to run a detector on. Most of the methods proposed in this categories focus on learning sequential policies. Under category (1), [3, 29] proposed a top-down sequential object detection models trained with Q-learning algorithm. Most of the RL methods associated with object detection fall into category (2). For example, [16] recursively divides up an image in a top-down approach where the divisions are decided by the RL agent. The box proposals returned by the agent are then passed through Fast-RCNN.

deep reinforcement learning, efficient object detection, reinforcement learning, (9 more...)

#artificialintelligence

Technology: Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Efficient nonmyopic batch active search

Jiang, Shali, Malkomes, Gustavo, Abbott, Matthew, Moseley, Benjamin, Garnett, Roman

Neural Information Processing SystemsDec-31-2018

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In these settings, specialized instruments can often evaluate \emph{multiple} points simultaneously; however, all existing work on active search focuses on sequential acquisition. We bridge this gap, addressing batch active search from both the theoretical and practical perspective. We first derive the Bayesian optimal policy for this problem, then prove a lower bound on the performance gap between sequential and batch optimal policies: the ``cost of parallelization.'' We also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation. We conduct thorough experiments on data from three application domains: a citation network, material science, and drug discovery, testing all proposed policies (14 total) with a wide range of batch sizes. Our results demonstrate that the empirical performance gap matches our theoretical bound, that nonmyopic policies usually significantly outperform myopic alternatives, and that diversity is an important consideration for batch policy design.

artificial intelligence, batch, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Lebanon (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Efficient nonmyopic batch active search

Jiang, Shali, Malkomes, Gustavo, Abbott, Matthew, Moseley, Benjamin, Garnett, Roman

Neural Information Processing SystemsDec-31-2018

Active search is a learning paradigm for actively identifying as many members of a given class as possible. A critical target scenario is high-throughput screening for scientific discovery, such as drug or materials discovery. In these settings, specialized instruments can often evaluate \emph{multiple} points simultaneously; however, all existing work on active search focuses on sequential acquisition. We bridge this gap, addressing batch active search from both the theoretical and practical perspective. We first derive the Bayesian optimal policy for this problem, then prove a lower bound on the performance gap between sequential and batch optimal policies: the ``cost of parallelization.'' We also propose novel, efficient batch policies inspired by state-of-the-art sequential policies, and develop an aggressive pruning technique that can dramatically speed up computation. We conduct thorough experiments on data from three application domains: a citation network, material science, and drug discovery, testing all proposed policies (14 total) with a wide range of batch sizes. Our results demonstrate that the empirical performance gap matches our theoretical bound, that nonmyopic policies usually significantly outperform myopic alternatives, and that diversity is an important consideration for batch policy design.

artificial intelligence, batch, machine learning, (17 more...)

Neural Information Processing Systems

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.14)
North America > United States > Missouri > St. Louis County > St. Louis (0.05)
North America > Canada > Quebec > Montreal (0.04)
Asia > Middle East > Lebanon (0.04)

Genre: Research Report > New Finding (1.00)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.90)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)

Add feedback

Budgeted Optimization with Constrained Experiments

Azimi, Javad, Fern, Xiaoli, Fern, Alan

Journal of Artificial Intelligence ResearchMay-30-2016

Motivated by a real-world problem, we study a novel budgeted optimization problem where the goal is to optimize an unknown function f(.) given a budget by requesting a sequence of samples from the function. In our setting, however, evaluating the function at precisely specified points is not practically possible due to prohibitive costs. Instead, we can only request constrained experiments. A constrained experiment, denoted by Q, specifies a subset of the input space for the experimenter to sample the function from. The outcome of Q includes a sampled experiment x, and its function output f(x). Importantly, as the constraints of Q become looser, the cost of fulfilling the request decreases, but the uncertainty about the location x increases. Our goal is to manage this trade-off by selecting a set of constrained experiments that best optimize f(.) within the budget. We study this problem in two different settings, the non-sequential (or batch) setting where a set of constrained experiments is selected at once, and the sequential setting where experiments are selected one at a time. We evaluate our proposed methods for both settings using synthetic and real functions. The experimental results demonstrate the efficacy of the proposed methods.

application, experiment, optimization, (13 more...)

Journal of Artificial Intelligence Research

doi: 10.1613/jair.4896

AI Access Foundation

11006

Journal of Artificial Intelligence Research

Country: