AITopics

2112.09429

Country:

North America > United States > Washington > King County > Seattle (0.14)
Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)

Genre: Research Report (1.00)

Industry:

Information Technology > Security & Privacy (1.00)
Health & Medicine > Health Care Technology > Medical Record (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Icarte, Rodrigo Toro, Waldie, Ethan, Klassen, Toryn Q., Valenzano, Richard, Castro, Margarita P., McIlraith, Sheila A.

Learning Reward Machines: A Study in Partially Observable Reinforcement Learning

arXiv.org Artificial IntelligenceDec-17-2021

Reinforcement learning (RL) is a central problem in artificial intelligence. This problem consists of defining artificial agents that can learn optimal behaviour by interacting with an environment -- where the optimal behaviour is defined with respect to a reward signal that the agent seeks to maximize. Reward machines (RMs) provide a structured, automata-based representation of a reward function that enables an RL agent to decompose an RL problem into structured subproblems that can be efficiently learned via off-policy learning. Here we show that RMs can be learned from experience, instead of being specified by the user, and that the resulting problem decomposition can be used to effectively solve partially observable RL problems. We pose the task of learning RMs as a discrete optimization problem where the objective is to find an RM that decomposes the problem into a set of subproblems such that the combination of their optimal memoryless policies is an optimal policy for the original problem. We show the effectiveness of this approach on three partially observable domains, where it significantly outperforms A3C, PPO, and ACER, and discuss its advantages, limitations, and broader potential.

agent, reward machine, rm state, (16 more...)

2112.09477

Country:

North America > Canada > Ontario > Toronto (0.14)
South America > Chile (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report > New Finding (0.46)

Industry:

Government (0.67)
Leisure & Entertainment > Games (0.45)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.68)

Song, Yunlong, Scaramuzza, Davide

Policy Search for Model Predictive Control with Application to Agile Drone Flight

arXiv.org Artificial IntelligenceDec-16-2021

Policy Search and Model Predictive Control~(MPC) are two different paradigms for robot control: policy search has the strength of automatically learning complex policies using experienced data, while MPC can offer optimal control performance using models and trajectory optimization. An open research question is how to leverage and combine the advantages of both approaches. In this work, we provide an answer by using policy search for automatically choosing high-level decision variables for MPC, which leads to a novel policy-search-for-model-predictive-control framework. Specifically, we formulate the MPC as a parameterized controller, where the hard-to-optimize decision variables are represented as high-level policies. Such a formulation allows optimizing policies in a self-supervised fashion. We validate this framework by focusing on a challenging problem in agile drone flight: flying a quadrotor through fast-moving gates. Experiments show that our controller achieves robust and real-time control performance in both simulation and the real world. The proposed framework offers a new perspective for merging learning and control.

artificial intelligence, machine learning, trajectory, (17 more...)

2112.0385

Country:

Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East (0.14)

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.91)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles > Drones (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty (0.93)

Bertsimas, Dimitris, Boix, Xavier, Carballo, Kimberly Villalobos, Hertog, Dick den

A Robust Optimization Approach to Deep Learning

arXiv.org Machine LearningDec-16-2021

Many state-of-the-art adversarial training methods leverage upper bounds of the adversarial loss to provide security guarantees. Yet, these methods require computations at each training step that can not be incorporated in the gradient for backpropagation. We introduce a new, more principled approach to adversarial training based on a closed form solution of an upper bound of the adversarial loss, which can be effectively trained with backpropagation. This bound is facilitated by state-of-the-art tools from robust optimization. We derive two new methods with our approach. The first method (Approximated Robust Upper Bound or aRUB) uses the first order approximation of the network as well as basic tools from linear robust optimization to obtain an approximate upper bound of the adversarial loss that can be easily implemented. The second method (Robust Upper Bound or RUB), computes an exact upper bound of the adversarial loss. Across a variety of tabular and vision data sets we demonstrate the effectiveness of our more principled approach -- RUB is substantially more robust than state-of-the-art methods for larger perturbations, while aRUB matches the performance of state-of-the-art methods for small perturbations. Also, both RUB and aRUB run faster than standard adversarial training (at the expense of an increase in memory). All the code to reproduce the results can be found at https://github.com/kimvc7/Robustness.

adversarial accuracy, neural network, robust optimization approach, (12 more...)

2112.09279

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre:

Research Report > Promising Solution (0.54)
Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Machine LearningDec-16-2021

Analysis of Generalized Bregman Surrogate Algorithms for Nonsmooth Nonconvex Statistical Learning

She, Yiyuan, Wang, Zhifeng, Jin, Jiuwu

Modern statistical applications often involve minimizing an objective function that may be nonsmooth and/or nonconvex. This paper focuses on a broad Bregman-surrogate algorithm framework including the local linear approximation, mirror descent, iterative thresholding, DC programming and many others as particular instances. The recharacterization via generalized Bregman functions enables us to construct suitable error measures and establish global convergence rates for nonconvex and nonsmooth objectives in possibly high dimensions. For sparse learning problems with a composite objective, under some regularity conditions, the obtained estimators as the surrogate's fixed points, though not necessarily local minimizers, enjoy provable statistical guarantees, and the sequence of iterates can be shown to approach the statistical truth within the desired accuracy geometrically fast. The paper also studies how to design adaptive momentum based accelerations without assuming convexity or smoothness by carefully controlling stepsize and relaxation parameters.

algorithm, convergence, regularity condition, (16 more...)

2112.09191

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)
North America > Canada > British Columbia (0.04)
(3 more...)

Genre:

Research Report (0.63)
Instructional Material > Course Syllabus & Notes (0.45)

Industry: Education (0.66)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

She, Yiyuan, Wang, Zhifeng, Shen, Jiahui

Gaining Outlier Resistance with Progressive Quantiles: Fast Algorithms and Theoretical Studies

arXiv.org Machine LearningDec-15-2021

Outliers widely occur in big-data applications and may severely affect statistical estimation and inference. In this paper, a framework of outlier-resistant estimation is introduced to robustify an arbitrarily given loss function. It has a close connection to the method of trimming and includes explicit outlyingness parameters for all samples, which in turn facilitates computation, theory, and parameter tuning. To tackle the issues of nonconvexity and nonsmoothness, we develop scalable algorithms with implementation ease and guaranteed fast convergence. In particular, a new technique is proposed to alleviate the requirement on the starting point such that on regular datasets, the number of data resamplings can be substantially reduced. Based on combined statistical and computational treatments, we are able to perform nonasymptotic analysis beyond M-estimation. The obtained resistant estimators, though not necessarily globally or even locally optimal, enjoy minimax rate optimality in both low dimensions and high dimensions. Experiments in regression, classification, and neural networks show excellent performance of the proposed methodology at the occurrence of gross outliers.

regression, regularity condition, statistics, (16 more...)

2112.08471

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > New Jersey > Hudson County > Hoboken (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre: Research Report (1.00)

Industry: Health & Medicine (0.92)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

arXiv.org Artificial IntelligenceDec-15-2021

MMO: Meta Multi-Objectivization for Software Configuration Tuning

Chen, Tao, Li, Miqing

Software configuration tuning is essential for optimizing a given performance objective (e.g., minimizing latency). Yet, due to the software's intrinsically complex configuration landscape and expensive measurement, there has been a rather mild success, particularly in preventing the search from being trapped in local optima. To address this issue, in this paper we take a different perspective. Instead of focusing on improving the optimizer, we work on the level of optimization model and propose a meta multi-objectivization (MMO) model that considers an auxiliary performance objective (e.g., throughput in addition to latency). What makes this model unique is that we do not optimize the auxiliary performance objective, but rather use it to make similarly-performing while different configurations less comparable (i.e. Pareto nondominated to each other), thus preventing the search from being trapped in local optima. Importantly through a new normalization method we show how to effectively use the MMO model without worrying about its weight -- the only yet highly sensitive parameter that can affect its effectiveness. Experiments on 22 cases from 11 real-world software systems/environments confirm that our MMO model with the new normalization performs better than its state-of-the-art single-objective counterparts on 82% cases while achieving up to 2.09x speedup. For 67% of the cases, the new normalization also enables the MMO model to outperform the instance when using it with the normalization used in our prior FSE work under pre-tuned best weights, saving a great amount of resources which would be otherwise necessary to find a good weight. We also demonstrate that the MMO model with the new normalization can consolidate Flash, a recent model-based tuning tool, on 68% of the cases with 1.22x speedup in general.

configuration, objective, performance objective, (13 more...)

2112.07303

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Europe > Spain > Galicia > Madrid (0.04)
(34 more...)

Genre:

Research Report > New Finding (0.93)
Research Report > Experimental Study (0.70)

Industry: Information Technology (0.46)

Technology:

Information Technology > Software Engineering (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Hanaka, Tesshu, Kobayashi, Yasuaki, Kurita, Kazuhiro, Lee, See Woo, Otachi, Yota

Computing Diverse Shortest Paths Efficiently: A Theoretical and Experimental Study

arXiv.org Artificial IntelligenceDec-15-2021

Finding diverse solutions in combinatorial problems recently has received considerable attention (Baste et al. 2020; Fomin et al. 2020; Hanaka et al. 2021). In this paper we study the following type of problems: given an integer $k$, the problem asks for $k$ solutions such that the sum of pairwise (weighted) Hamming distances between these solutions is maximized. Such solutions are called diverse solutions. We present a polynomial-time algorithm for finding diverse shortest $st$-paths in weighted directed graphs. Moreover, we study the diverse version of other classical combinatorial problems such as diverse weighted matroid bases, diverse weighted arborescences, and diverse bipartite matchings. We show that these problems can be solved in polynomial time as well. To evaluate the practical performance of our algorithm for finding diverse shortest $st$-paths, we conduct a computational experiment with synthetic and real-world instances.The experiment shows that our algorithm successfully computes diverse solutions within reasonable computational time.

algorithm, graph, polynomial time, (13 more...)

2112.05403

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Asia > Japan > Honshū > Kansai > Kyoto Prefecture > Kyoto (0.04)
South America > Argentina > Pampas > Buenos Aires F.D. > Buenos Aires (0.04)
(8 more...)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.93)

arXiv.org Machine LearningDec-14-2021

CEM-GD: Cross-Entropy Method with Gradient Descent Planner for Model-Based Reinforcement Learning

Huang, Kevin, Lale, Sahin, Rosolia, Ugo, Shi, Yuanyuan, Anandkumar, Anima

Current state-of-the-art model-based reinforcement learning algorithms use trajectory sampling methods, such as the Cross-Entropy Method (CEM), for planning in continuous control settings. These zeroth-order optimizers require sampling a large number of trajectory rollouts to select an optimal action, which scales poorly for large prediction horizons or high dimensional action spaces. First-order methods that use the gradients of the rewards with respect to the actions as an update can mitigate this issue, but suffer from local optima due to the non-convex optimization landscape. To overcome these issues and achieve the best of both worlds, we propose a novel planner, Cross-Entropy Method with Gradient Descent (CEM-GD), that combines first-order methods with CEM. At the beginning of execution, CEM-GD uses CEM to sample a significant amount of trajectory rollouts to explore the optimization landscape and avoid poor local minima. It then uses the top trajectories as initialization for gradient descent and applies gradient updates to each of these trajectories to find the optimal action sequence. At each subsequent time step, however, CEM-GD samples much fewer trajectories from CEM before applying gradient updates. We show that as the dimensionality of the planning problem increases, CEM-GD maintains desirable performance with a constant small number of samples by using the gradient information, while avoiding local optima using initially well-sampled trajectories. Furthermore, CEM-GD achieves better performance than CEM on a variety of continuous control benchmarks in MuJoCo with 100x fewer samples per time step, resulting in around 25% less computation time and 10% less memory usage. The implementation of CEM-GD is available at $\href{https://github.com/KevinHuang8/CEM-GD}{\text{https://github.com/KevinHuang8/CEM-GD}}$.

action sequence, cem, sequence, (17 more...)

2112.07746

Country: Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.84)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Gramacy, Robert B., Sauer, Annie, Wycoff, Nathan

Triangulation candidates for Bayesian optimization

arXiv.org Machine LearningDec-14-2021

Bayesian optimization is a form of sequential design: idealize input-output relationships with a suitably flexible nonlinear regression model; fit to data from an initial experimental campaign; devise and optimize a criterion for selecting the next experimental condition(s) under the fitted model (e.g., via predictive equations) to target outcomes of interest (say minima); repeat after acquiring output under those conditions and updating the fit. In many situations this "inner optimization" over the new-data acquisition criterion is cumbersome because it is non-convex/highly multi-modal, may be non-differentiable, or may otherwise thwart numerical optimizers, especially when inference requires Monte Carlo. In such cases it is not uncommon to replace continuous search with a discrete one over random candidates. Here we propose using candidates based on a Delaunay triangulation of the existing input design. In addition to detailing construction of these "tricands", based on a simple wrapper around a conventional convex hull library, we promote several advantages based on properties of the geometric criterion involved. We then demonstrate empirically how tricands can lead to better Bayesian optimization performance compared to both numerically optimized acquisitions and random candidate-based alternatives on benchmark problems.

acquisition, optimization, tricand, (15 more...)

2112.07457

Country:

North America > United States > Virginia (0.04)
North America > United States > New York > New York County > New York City (0.04)
North America > United States > Florida > Palm Beach County > Boca Raton (0.04)
(2 more...)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Search (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.54)