AITopics

2209.07744

Country:

Europe (0.04)
Asia > South Korea > Daejeon > Daejeon (0.04)
North America > United States > New York (0.04)
(3 more...)

Genre: Research Report (0.82)

Industry:

Transportation > Ground > Road (1.00)
Energy > Renewable > Solar (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.67)

arXiv.org Artificial IntelligenceDec-30-2022

CARE: Certifiably Robust Learning with Reasoning via Variational Inference

Zhang, Jiawei, Li, Linyi, Zhang, Ce, Li, Bo

Despite great recent advances achieved by deep neural networks (DNNs), they are often vulnerable to adversarial attacks. Intensive research efforts have been made to improve the robustness of DNNs; however, most empirical defenses can be adaptively attacked again, and the theoretically certified robustness is limited, especially on large-scale datasets. One potential root cause of such vulnerabilities for DNNs is that although they have demonstrated powerful expressiveness, they lack the reasoning ability to make robust and reliable predictions. In this paper, we aim to integrate domain knowledge to enable robust learning with the reasoning paradigm. In particular, we propose a certifiably robust learning with reasoning pipeline (CARE), which consists of a learning component and a reasoning component. Concretely, we use a set of standard DNNs to serve as the learning component to make semantic predictions, and we leverage the probabilistic graphical models, such as Markov logic networks (MLN), to serve as the reasoning component to enable knowledge/logic reasoning. However, it is known that the exact inference of MLN (reasoning) is #P-complete, which limits the scalability of the pipeline. To this end, we propose to approximate the MLN inference via variational inference based on an efficient expectation maximization algorithm. In particular, we leverage graph convolutional networks (GCNs) to encode the posterior distribution during variational inference and update the parameters of GCNs (E-step) and the weights of knowledge rules in MLN (M-step) iteratively. We conduct extensive experiments on different datasets and show that CARE achieves significantly higher certified robustness compared with the state-of-the-art baselines. We additionally conducted different ablation studies to demonstrate the empirical robustness of CARE and the effectiveness of different knowledge integration.

artificial intelligence, machine learning, sensor, (17 more...)

2209.05055

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Switzerland > Zürich > Zürich (0.14)
Asia > Middle East > Jordan (0.04)
North America > United States > Illinois > Champaign County > Urbana (0.04)

Genre: Research Report (0.50)

Industry:

Health & Medicine (0.67)
Information Technology > Security & Privacy (0.49)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.87)

Rollout Algorithms and Approximate Dynamic Programming for Bayesian Optimization and Sequential Estimation

Bertsekas, Dimitri

We provide a unifying approximate dynamic programming framework that applies to a broad variety of problems involving sequential estimation. We consider first the construction of surrogate cost functions for the purposes of optimization, and we focus on the special case of Bayesian optimization, using the rollout algorithm and some of its variations. We then discuss the more general case of sequential estimation of a random vector using optimal measurement selection, and its application to problems of stochastic and adaptive control. We distinguish between adaptive control of deterministic and stochastic systems: the former are better suited for the use of rollout, while the latter are well suited for the use of rollout with certainty equivalence approximations. As an example of the deterministic case, we discuss sequential decoding problems, and a rollout algorithm for the approximate solution of the Wordle and Mastermind puzzles, recently developed in the paper [BBB22].

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2212.07998

Country:

North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
Europe > Poland > Lesser Poland Province > Kraków (0.04)
North America > United States > Michigan (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.94)
(2 more...)

On Deep Recurrent Reinforcement Learning for Active Visual Tracking of Space Noncooperative Objects

Zhou, Dong, Sun, Guanghui, Zhang, Zhao, Wu, Ligang

Active tracking of space noncooperative object that merely relies on vision camera is greatly significant for autonomous rendezvous and debris removal. Considering its Partial Observable Markov Decision Process (POMDP) property, this paper proposes a novel tracker based on deep recurrent reinforcement learning, named as RAMAVT which drives the chasing spacecraft to follow arbitrary space noncooperative object with high-frequency and near-optimal velocity control commands. To further improve the active tracking performance, we introduce Multi-Head Attention (MHA) module and Squeeze-and-Excitation (SE) layer into RAMAVT, which remarkably improve the representative ability of neural network with almost no extra computational cost. Extensive experiments and ablation study implemented on SNCOAT benchmark show the effectiveness and robustness of our method compared with other state-of-the-art algorithm. The source codes are available on https://github.com/Dongzhou-1996/RAMAVT.

machine learning, ramavt, reinforcement learning, (14 more...)

2212.14304

Country:

Asia > China > Heilongjiang Province > Harbin (0.04)
Asia > India (0.04)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Yang, Sikun, Zha, Hongyuan

Estimating Latent Population Flows from Aggregated Data via Inversing Multi-Marginal Optimal Transport

artificial intelligence, inversing multi-marginal optimal transport, machine learning, (2 more...)

We study the problem of estimating latent population flows from aggregated count data. This problem arises when individual trajectories are not available due to privacy issues or measurement fidelity. Instead, the aggregated observations are measured over discrete-time points, for estimating the population flows among states. Most related studies tackle the problems by learning the transition parameters of a time-homogeneous Markov process. Nonetheless, most real-world population flows can be influenced by various uncertainties such as traffic jam and weather conditions. Thus, in many cases, a time-homogeneous Markov model is a poor approximation of the much more complex population flows. To circumvent this difficulty, we resort to a multi-marginal optimal transport (MOT) formulation that can naturally represent aggregated observations with constrained marginals, and encode time-dependent transition matrices by the cost functions. In particular, we propose to estimate the transition flows from aggregated data by learning the cost functions of the MOT framework, which enables us to capture time-varying dynamic patterns. The experiments demonstrate the improved accuracy of the proposed algorithms than the related methods in estimating several real-world transition flows.

2212.14527

Genre: Research Report (0.40)

Industry: Information Technology > Security & Privacy (0.53)

Technology:

Information Technology > Security & Privacy (0.53)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.53)

Angulo, Brian, Panov, Aleksandr, Yakovlev, Konstantin

Policy Optimization to Learn Adaptive Motion Primitives in Path Planning with Dynamic Obstacles

This paper addresses the kinodynamic motion planning for non-holonomic robots in dynamic environments with both static and dynamic obstacles -- a challenging problem that lacks a universal solution yet. One of the promising approaches to solve it is decomposing the problem into the smaller sub problems and combining the local solutions into the global one. The crux of any planning method for non-holonomic robots is the generation of motion primitives that generates solutions to local planning sub-problems. In this work we introduce a novel learnable steering function (policy), which takes into account kinodynamic constraints of the robot and both static and dynamic obstacles. This policy is efficiently trained via the policy optimization. Empirically, we show that our steering function generalizes well to unseen problems. We then plug in the trained policy into the sampling-based and lattice-based planners, and evaluate the resultant POLAMP algorithm (Policy Optimization that Learns Adaptive Motion Primitives) in a range of challenging setups that involve a car-like robot operating in the obstacle-rich parking-lot environments. We show that POLAMP is able to plan collision-free kinodynamic trajectories with success rates higher than 92%, when 50 simultaneously moving obstacles populate the environment showing better performance than the state-of-the-art competitors.

machine learning, obstacle, reinforcement learning, (17 more...)

2212.14307

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (0.82)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Wu, Junlin, Sibai, Hussein, Vorobeychik, Yevgeniy

Certifying Safety in Reinforcement Learning under Adversarial Perturbation Attacks

arXiv.org Artificial IntelligenceDec-28-2022

Function approximation has enabled remarkable advances in applying reinforcement learning (RL) techniques in environments with high-dimensional inputs, such as images, in an end-to-end fashion, mapping such inputs directly to low-level control. Nevertheless, these have proved vulnerable to small adversarial input perturbations. A number of approaches for improving or certifying robustness of end-to-end RL to adversarial perturbations have emerged as a result, focusing on cumulative reward. However, what is often at stake in adversarial scenarios is the violation of fundamental properties, such as safety, rather than the overall reward that combines safety with efficiency. Moreover, properties such as safety can only be defined with respect to true state, rather than the high-dimensional raw inputs to end-to-end policies. To disentangle nominal efficiency and adversarial safety, we situate RL in deterministic partially-observable Markov decision processes (POMDPs) with the goal of maximizing cumulative reward subject to safety constraints. We then propose a partially-supervised reinforcement learning (PSRL) framework that takes advantage of an additional assumption that the true state of the POMDP is known at training time. We present the first approach for certifying safety of PSRL policies under adversarial input perturbations, and two adversarial training approaches that make direct use of PSRL. Our experiments demonstrate both the efficacy of the proposed approach for certifying safety in adversarial environments, and the value of the PSRL framework coupled with adversarial training in improving certified safety while preserving high nominal reward and high-quality predictions of true state.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2212.14115

Country:

North America > United States > New York > New York County > New York City (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > California > Los Angeles County > Pasadena (0.04)

Genre: Research Report (0.64)

Industry: Information Technology (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

arXiv.org Artificial IntelligenceDec-28-2022

Conflict Avoidance in Social Navigation -- a Survey

Mirsky, Reuth, Xiao, Xuesu, Hart, Justin, Stone, Peter

A major goal in robotics is to enable intelligent mobile robots to operate smoothly in shared human-robot environments. One of the most fundamental capabilities in service of this goal is competent navigation in this ``social" context. As a result, there has been a recent surge of research on social navigation; and especially as it relates to the handling of conflicts between agents during social navigation. These developments introduce a variety of models and algorithms, however as this research area is inherently interdisciplinary, many of the relevant papers are not comparable and there is no shared standard vocabulary. This survey aims to bridge this gap by introducing such a common language, using it to survey existing work, and highlighting open problems. It starts by defining the boundaries of this survey to a limited, yet highly common type of social navigation - conflict avoidance. Within this proposed scope, this survey introduces a detailed taxonomy of the conflict avoidance components. This survey then maps existing work into this taxonomy, while discussing papers using its framing. Finally, this paper proposes some future research directions and open problems that are currently on the frontier of social navigation to aid ongoing and future research.

machine learning, navigation, reinforcement learning, (17 more...)

2106.12113

Country:

North America > United States > New York > New York County > New York City (0.04)
Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
(6 more...)

Genre:

Overview (1.00)
Research Report > New Finding (0.45)

Industry:

Transportation (0.69)
Health & Medicine (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Planning & Scheduling (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.68)
(3 more...)

Kim, Ju-Bong, Choi, Ho-Bin, Han, Youn-Hee

Strangeness-driven Exploration in Multi-Agent Reinforcement Learning

arXiv.org Artificial IntelligenceDec-27-2022

Efficient exploration strategy is one of essential issues in cooperative multi-agent reinforcement learning (MARL) algorithms requiring complex coordination. In this study, we introduce a new exploration method with the strangeness that can be easily incorporated into any centralized training and decentralized execution (CTDE)-based MARL algorithms. The strangeness refers to the degree of unfamiliarity of the observations that an agent visits. In order to give the observation strangeness a global perspective, it is also augmented with the the degree of unfamiliarity of the visited entire state. The exploration bonus is obtained from the strangeness and the proposed exploration method is not much affected by stochastic transitions commonly observed in MARL tasks. To prevent a high exploration bonus from making the MARL training insensitive to extrinsic rewards, we also propose a separate action-value function trained by both extrinsic reward and exploration bonus, on which a behavioral policy to generate transitions is designed based. It makes the CTDE-based MARL algorithms more stable when they are used with an exploration method. Through a comparative evaluation in didactic examples and the StarCraft Multi-Agent Challenge, we show that the proposed exploration method achieves significant performance improvement in the CTDE-based MARL algorithms.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2212.13448

Genre: Research Report > New Finding (0.34)

Industry:

Leisure & Entertainment > Games > Computer Games (0.34)
Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Hwang, Taehyun, Oh, Min-hwan

Model-Based Reinforcement Learning with Multinomial Logistic Function Approximation

arXiv.org Artificial IntelligenceDec-27-2022

We study model-based reinforcement learning (RL) for episodic Markov decision processes (MDP) whose transition probability is parametrized by an unknown transition core with features of state and action. Despite much recent progress in analyzing algorithms in the linear MDP setting, the understanding of more general transition models is very restrictive. In this paper, we establish a provably efficient RL algorithm for the MDP whose state transition is given by a multinomial logistic model. To balance the exploration-exploitation trade-off, we propose an upper confidence bound-based algorithm. We show that our proposed algorithm achieves $\tilde{\mathcal{O}}(d \sqrt{H^3 T})$ regret bound where $d$ is the dimension of the transition core, $H$ is the horizon, and $T$ is the total number of steps. To the best of our knowledge, this is the first model-based RL algorithm with multinomial logistic function approximation with provable guarantees. We also comprehensively evaluate our proposed algorithm numerically and show that it consistently outperforms the existing methods, hence achieving both provable efficiency and practical superior performance.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2212.1354

Genre: Research Report (1.00)

Industry: Energy > Oil & Gas > Upstream (0.34)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Fuzzy Logic (0.64)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)