AITopics

2201.12518

Country:

Asia > Middle East > Jordan (0.04)
Asia > China > Beijing > Beijing (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.90)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.87)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.69)
(2 more...)

arXiv.org Artificial IntelligenceJan-28-2022

Towards Safe Reinforcement Learning with a Safety Editor Policy

Yu, Haonan, Xu, Wei, Zhang, Haichao

We consider the safe reinforcement learning (RL) problem of maximizing utility while satisfying provided constraints. Since we do not assume any prior knowledge or pre-training of the safety concept, we are interested in asymptotic constraint satisfaction. A popular approach in this line of research is to combine the Lagrangian method with a model-free RL algorithm to adjust the weight of the constraint reward dynamically. It relies on a single policy to handle the conflict between utility and constraint rewards, which is often challenging. Inspired by the safety layer design (Dalal et al., 2018), we propose to separately learn a safety editor policy that transforms potentially unsafe actions output by a utility maximizer policy into safe ones. The safety editor is trained to maximize the constraint reward while minimizing a hinge loss of the utility Q values of actions before and after the edit. On 12 custom Safety Gym (Ray et al., 2019) tasks and 2 safe racing tasks with very harsh constraint thresholds, our approach demonstrates outstanding utility performance while complying with the constraints. Ablation studies reveal that our two-policy design is critical. Simply doubling the model capacity of typical single-policy approaches will not lead to comparable results. The Q hinge loss is also important in certain circumstances, and replacing it with the usual L2 distance could fail badly.

constraint reward, safe reinforcement learning, seditor, (12 more...)

2201.12427

Country:

Asia > Middle East > Jordan (0.04)
North America > United States > California > Santa Clara County > Cupertino (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Leisure & Entertainment > Games (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Scholl, Philipp, Dietrich, Felix, Otte, Clemens, Udluft, Steffen

Safe Policy Improvement Approaches on Discrete Markov Decision Processes

arXiv.org Artificial IntelligenceJan-28-2022

Safe Policy Improvement (SPI) aims at provable guarantees that a learned policy is at least approximately as good as a given baseline policy. Building on SPI with Soft Baseline Bootstrapping (Soft-SPIBB) by Nadjahi et al., we identify theoretical issues in their approach, provide a corrected theory, and derive a new algorithm that is provably safe on finite Markov Decision Processes (MDP). Additionally, we provide a heuristic algorithm that exhibits the best performance among many state of the art SPI algorithms on two different benchmarks. Furthermore, we introduce a taxonomy of SPI algorithms and empirically show an interesting property of two classes of SPI algorithms: while the mean performance of algorithms that incorporate the uncertainty as a penalty on the action-value is higher, actively restricting the set of policies more consistently produces good policies and is, thus, safer.

algorithm, safe policy improvement approach, state-action pair, (12 more...)

2201.12175

Country:

Europe > Germany > Bavaria > Upper Bavaria > Munich (0.05)
North America > United States > Massachusetts (0.04)
North America > United States > California > Los Angeles County > Santa Monica (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.95)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.86)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)

Kajino, Hiroshi, Miyaguchi, Kohei, Osogami, Takayuki

Biases in In Silico Evaluation of Molecular Optimization Methods and Bias-Reduced Evaluation Methodology

arXiv.org Machine LearningJan-28-2022

Molecular optimization aims to discover novel molecules with improved properties, which is often formulated as a reinforcement learning problem by modeling the construction of a molecule using a Markov decision process. The performance of such agents is measured by the quality of generated molecules. In the community of machine learning, most of the molecular optimization methods have been verified in silico, i.e., in computer simulation. Since most of the generated molecules are novel, their properties are unknown and we have to resort to a predictor to estimate the properties. However, little attention has been paid to how reliable such estimates are, which makes the existing performance estimates less reliable. In this paper, we study the statistical performance of such performance estimators to enhance our understanding of the evaluation protocol and we discuss several directions to improve it. Let us first introduce a common practice to estimate the performance in silico.

estimator, molecule, performance estimator, (14 more...)

arXiv.org Machine Learning

2201.12163

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.84)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Sergeant-Perthuis, Grégoire

Regionalized optimization

Yedidia, Freeman, Weiss have shown in their reference article, "Constructing Free Energy Approximations and Generalized Belief Propagation Algorithms", that there is a variational principle underlying the General Belief Propagation, by introducing a region-based free energy approximation of the MaxEnt free energy, that we will call the Generalized Bethe free energy. They sketched a proof that fixed points of the General Belief Propagation are critical points of this free energy, this proof was completed in the thesis of Peltre. In this paper we identify a class of optimization problems defined as patching local optimization problems and associated message passing algorithms for which such correspondence between critical points and fix points of the algorithms holds. This framework holds many applications one of which being a PCA for filtered data and a region-based approximation of MaxEnT with stochastic compatibility constraints on the region probabilities. Such approach is particularly adapted for inference with multimodal integration, inference on scenes with multiple views.

artificial intelligence, cost function, optimization problem, (14 more...)

2201.11876

Country:

North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > Japan (0.04)

Genre: Research Report (0.40)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Constrained Variational Policy Optimization for Safe Reinforcement Learning

Liu, Zuxin, Cen, Zhepeng, Isenbaev, Vladislav, Liu, Wei, Wu, Zhiwei Steven, Li, Bo, Zhao, Ding

constrained variational policy optimization, constraint, variational distribution, (10 more...)

Safe reinforcement learning (RL) aims to learn policies that satisfy certain constraints before deploying to safety-critical applications. Primal-dual as a prevalent constrained optimization framework suffers from instability issues and lacks optimality guarantees. This paper overcomes the issues from a novel probabilistic inference perspective and proposes an Expectation-Maximization style approach to learn safe policy. We show that the safe RL problem can be decomposed to 1) a convex optimization phase with a non-parametric variational distribution and 2) a supervised learning phase. We show the unique advantages of constrained variational policy optimization by proving its optimality and policy improvement stability. A wide range of experiments on continuous robotic tasks show that the proposed method achieves significantly better performance in terms of constraint satisfaction and sample efficiency than primal-dual baselines.

2201.11927

Country:

North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.87)

Lavinas, Yuri, Aranha, Claus, Ochoa, Gabriela

Search Trajectories Networks of Multiobjective Evolutionary Algorithms

Understanding the search dynamics of multiobjective evolutionary algorithms (MOEAs) is still an open problem. This paper extends a recent network-based tool, search trajectory networks (STNs), to model the behavior of MOEAs. Our approach uses the idea of decomposition, where a multiobjective problem is transformed into several single-objective problems. We show that STNs can be used to model and distinguish the search behavior of two popular multiobjective algorithms, MOEA/D and NSGA-II, using 10 continuous benchmark problems with 2 and 3 objectives. Our findings suggest that we can improve our understanding of MOEAs using STNs for algorithm analysis.

nsga-ii, stn, vector, (14 more...)

2201.11726

Country:

Asia > Japan > Honshū > Kantō > Ibaraki Prefecture > Tsukuba (0.05)
North America > United States > New York > New York County > New York City (0.04)
Europe > United Kingdom > England > Essex > Colchester (0.04)
Asia > Singapore (0.04)

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)

Liu, Daqi, Bober, Miroslaw, Kittler, Josef

Constrained Structure Learning for Scene Graph Generation

As a structured prediction task, scene graph generation aims to build a visually-grounded scene graph to explicitly model objects and their relationships in an input image. Currently, the mean field variational Bayesian framework is the de facto methodology used by the existing methods, in which the unconstrained inference step is often implemented by a message passing neural network. However, such formulation fails to explore other inference strategies, and largely ignores the more general constrained optimization models. In this paper, we present a constrained structure learning method, for which an explicit constrained variational inference objective is proposed. Instead of applying the ubiquitous message-passing strategy, a generic constrained optimization method - entropic mirror descent - is utilized to solve the constrained variational inference step. We validate the proposed generic model on various popular scene graph generation benchmarks and show that it outperforms the state-of-the-art methods.

computer vision and pattern recognition, predicate category, proceedings, (7 more...)

2201.11697

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Surrey > Guildford (0.04)
Europe > Italy > Calabria > Catanzaro Province > Catanzaro (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > Promising Solution (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.34)

Silva, Vasco D., Finamore, Anna, Henriques, Rui

On the Role of Multi-Objective Optimization to the Transit Network Design Problem

Ongoing traffic changes, including those triggered by the COVID-19 pandemic, reveal the necessity to adapt our public transport systems to the ever-changing users' needs. This work shows that single and multi objective stances can be synergistically combined to better answer the transit network design problem (TNDP). Single objective formulations are dynamically inferred from the rating of networks in the approximated (multi-objective) Pareto Front, where a regression approach is used to infer the optimal weights of transfer needs, times, distances, coverage, and costs. As a guiding case study, the solution is applied to the multimodal public transport network in the city of Lisbon, Portugal. The system takes individual trip data given by smartcard validations at CARRIS buses and METRO subway stations and uses them to estimate the origin-destination demand in the city. Then, Genetic Algorithms are used, considering both single and multi objective approaches, to redesign the bus network that better fits the observed traffic demand. The proposed TNDP optimization proved to improve results, with reductions in objective functions of up to 28.3%. The system managed to extensively reduce the number of routes, and all passenger related objectives, including travel time and transfers per trip, significantly improve. Grounded on automated fare collection data, the system can incrementally redesign the bus network to dynamically handle ongoing changes to the city traffic.

bus network, objective, optimization, (13 more...)

2201.11616

Country:

Europe > Portugal > Lisbon > Lisbon (0.35)
Europe > Italy > Lombardy > Milan (0.04)
Asia > India (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Infrastructure & Services (1.00)
Transportation > Ground > Road (0.69)

Technology:

Information Technology > Communications > Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.34)

Maus, Natalie, Jones, Haydn T., Moore, Juston S., Kusner, Matt J., Bradshaw, John, Gardner, Jacob R.

Local Latent Space Bayesian Optimization over Structured Inputs

arXiv.org Machine LearningJan-27-2022

Bayesian optimization over the latent spaces of deep autoencoder models (DAEs) has recently emerged as a promising new approach for optimizing challenging black-box functions over structured, discrete, hard-to-enumerate search spaces (e.g., molecules). Here the DAE dramatically simplifies the search space by mapping inputs into a continuous latent space where familiar Bayesian optimization tools can be more readily applied. Despite this simplification, the latent space typically remains high-dimensional. Thus, even with a well-suited latent space, these approaches do not necessarily provide a complete solution, but may rather shift the structured optimization problem to a high-dimensional one. In this paper, we propose LOL-BO, which adapts the notion of trust regions explored in recent work on high-dimensional Bayesian optimization to the structured setting. By reformulating the encoder to function as both an encoder for the DAE globally and as a deep kernel for the surrogate model within a trust region, we better align the notion of local optimization in the latent space with local optimization in the input space. LOL-BO achieves as much as 20 times improvement over state-of-the-art latent space Bayesian optimization methods across six real-world benchmarks, demonstrating that improvement in optimization strategies is as important as developing better DAE models.

bayesian optimization, latent space, optimization, (13 more...)

arXiv.org Machine Learning

2201.11872

Country:

North America > United States > Pennsylvania (0.04)
North America > United States > New Mexico > Los Alamos County > Los Alamos (0.04)
North America > United States > Massachusetts (0.04)
Europe > Sweden > Stockholm > Stockholm (0.04)

Genre: Research Report (0.82)

Industry: Health & Medicine (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)