AITopics

2302.03439

Country:

Europe > United Kingdom > England > Greater London > London (0.14)
Europe > United Kingdom > Scotland > City of Edinburgh > Edinburgh (0.04)

Genre: Research Report (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents > Agent Societies (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceApr-16-2023

Pessimism in the Face of Confounders: Provably Efficient Offline Reinforcement Learning in Partially Observable Markov Decision Processes

Lu, Miao, Min, Yifei, Wang, Zhaoran, Yang, Zhuoran

We study offline reinforcement learning (RL) in partially observable Markov decision processes. In particular, we aim to learn an optimal policy from a dataset collected by a behavior policy which possibly depends on the latent state. Such a dataset is confounded in the sense that the latent state simultaneously affects the action and the observation, which is prohibitive for existing offline RL algorithms. To this end, we propose the \underline{P}roxy variable \underline{P}essimistic \underline{P}olicy \underline{O}ptimization (\texttt{P3O}) algorithm, which addresses the confounding bias and the distributional shift between the optimal and behavior policies in the context of general function approximation. At the core of \texttt{P3O} is a coupled sequence of pessimistic confidence regions constructed via proximal causal inference, which is formulated as minimax estimation. Under a partial coverage assumption on the confounded dataset, we prove that \texttt{P3O} achieves a $n^{-1/2}$-suboptimality, where $n$ is the number of trajectories in the dataset. To our best knowledge, \texttt{P3O} is the first provably efficient offline RL algorithm for POMDPs with a confounded dataset.

artificial intelligence, machine learning, reinforcement learning, (16 more...)

2205.13589

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Asia > China (0.04)

Genre: Research Report (0.81)

Industry: Health & Medicine > Therapeutic Area (0.92)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Chepelianskii, Alexei D., Majumdar, Satya N., Schawe, Hendrik, Trizac, Emmanuel

Metropolis Monte Carlo sampling: convergence, localization transition and optimality

arXiv.org Artificial IntelligenceApr-15-2023

Although Buffon's needle problem [1] may be considered as the earliest documented use of Monte Carlo sampling (18th century), the method was developed at the end of the second world war and dates from the early days of computer use [2, 3]. With the increase in computational power, it has become a pervasive and versatile technique in basic sciences and engineering. It uses random sampling for solving both deterministic and stochastic problems, as found in physics, biology, chemistry, or artificial intelligence [4-10]. Monte Carlo techniques also allow to assess risk in quantitative analysis and decision making [11, 12], and their methodological developments provide tools for economy, epidemiology or archaeology [13]. It is then crucial to understand the type of errors which can be introduced as a consequence of the incomplete convergence of such algorithms. Our interest goes to Markov Chain Monte Carlo techniques [14, 15], that create correlated random samples from a target distribution; a special emphasis is put on the relaxation rate of these methods. From the target probability distribution, a sequence of samples is obtained by a random walk, with appropriate transition probabilities. The walker's density evolves at long times towards the target distribution, and quantities of interest follow from the law of large numbers and other methods of statistical inference [11-19]. The Monte-Carlo method and its modern developments [20-30] have now been adopted in many topics in and outside physics.

eigenvalue, localization transition, master equation, (13 more...)

doi: 10.1088/1742-5468/ad002d

2207.10488

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > France (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
(2 more...)

Genre: Research Report (0.63)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.35)

Task-Relevant Failure Detection for Trajectory Predictors in Autonomous Vehicles

Farid, Alec, Veer, Sushant, Ivanovic, Boris, Leung, Karen, Pavone, Marco

In modern autonomy stacks, prediction modules are paramount to planning motions in the presence of other mobile agents. However, failures in prediction modules can mislead the downstream planner into making unsafe decisions. Indeed, the high uncertainty inherent to the task of trajectory forecasting ensures that such mispredictions occur frequently. Motivated by the need to improve safety of autonomous vehicles without compromising on their performance, we develop a probabilistic run-time monitor that detects when a "harmful" prediction failure occurs, i.e., a task-relevant failure detector. We achieve this by propagating trajectory prediction errors to the planning cost to reason about their impact on the AV. Furthermore, our detector comes equipped with performance measures on the false-positive and the false-negative rate and allows for data-free calibration. In our experiments we compared our detector with various others and found that our detector has the highest area under the receiver operator characteristic curve.

artificial intelligence, detector, machine learning, (17 more...)

2207.1238

Country: Oceania > New Zealand > North Island > Auckland Region > Auckland (0.04)

Genre: Research Report > New Finding (0.66)

Industry:

Information Technology (0.94)
Transportation > Ground > Road (0.68)
Automobiles & Trucks (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

Metelli, Alberto Maria, Mutti, Mirco, Restelli, Marcello

A Tale of Sampling and Estimation in Discounted Reinforcement Learning

The most relevant problems in discounted reinforcement learning involve estimating the mean of a function under the stationary distribution of a Markov reward process, such as the expected return in policy evaluation, or the policy gradient in policy optimization. In practice, these estimates are produced through a finite-horizon episodic sampling, which neglects the mixing properties of the Markov process. It is mostly unclear how this mismatch between the practical and the ideal setting affects the estimation, and the literature lacks a formal study on the pitfalls of episodic sampling, and how to do it optimally. In this paper, we present a minimax lower bound on the discounted mean estimation problem that explicitly connects the estimation error with the mixing properties of the Markov process and the discount factor. Then, we provide a statistical analysis on a set of notable estimators and the corresponding sampling procedures, which includes the finite-horizon estimators often used in practice. Crucially, we show that estimating the mean by directly sampling from the discounted kernel of the Markov process brings compelling statistical properties w.r.t. the alternative estimators, as it matches the lower bound without requiring a careful tuning of the episode horizon.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

2304.05073

Country:

North America > United States > New York (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > Spain > Valencian Community > Valencia Province > Valencia (0.04)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.92)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.88)

Hammar, Kim, Stadler, Rolf

Learning Near-Optimal Intrusion Responses Against Dynamic Attackers

We study automated intrusion response and formulate the interaction between an attacker and a defender as an optimal stopping game where attack and defense strategies evolve through reinforcement learning and self-play. The game-theoretic modeling enables us to find defender strategies that are effective against a dynamic attacker, i.e. an attacker that adapts its strategy in response to the defender strategy. Further, the optimal stopping formulation allows us to prove that optimal strategies have threshold properties. To obtain near-optimal defender strategies, we develop Threshold Fictitious Self-Play (T-FP), a fictitious self-play algorithm that learns Nash equilibria through stochastic approximation. We show that T-FP outperforms a state-of-the-art algorithm for our use case. The experimental part of this investigation includes two systems: a simulation system where defender strategies are incrementally learned and an emulation system where statistics are collected that drive simulation runs and where learned strategies are evaluated. We argue that this approach can produce effective defender strategies for a practical IT infrastructure.

data mining, machine learning, reinforcement learning, (18 more...)

2301.06085

Country:

Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.14)
North America > United States > New York > New York County > New York City (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry:

Leisure & Entertainment > Games (1.00)
Government > Military > Cyberwarfare (0.67)

Technology:

Information Technology > Game Theory (1.00)
Information Technology > Data Science > Data Mining (1.00)
Information Technology > Communications > Networks (1.00)
(3 more...)

Minimax-Optimal Reward-Agnostic Exploration in Reinforcement Learning

Li, Gen, Yan, Yuling, Chen, Yuxin, Fan, Jianqing

This paper studies reward-agnostic exploration in reinforcement learning (RL) -- a scenario where the learner is unware of the reward functions during the exploration stage -- and designs an algorithm that improves over the state of the art. More precisely, consider a finite-horizon non-stationary Markov decision process with $S$ states, $A$ actions, and horizon length $H$, and suppose that there are no more than a polynomial number of given reward functions of interest. By collecting an order of \begin{align*} \frac{SAH^3}{\varepsilon^2} \text{ sample episodes (up to log factor)} \end{align*} without guidance of the reward information, our algorithm is able to find $\varepsilon$-optimal policies for all these reward functions, provided that $\varepsilon$ is sufficiently small. This forms the first reward-agnostic exploration scheme in this context that achieves provable minimax optimality. Furthermore, once the sample size exceeds $\frac{S^2AH^3}{\varepsilon^2}$ episodes (up to log factor), our algorithm is able to yield $\varepsilon$ accuracy for arbitrarily many reward functions (even when they are adversarially designed), a task commonly dubbed as ``reward-free exploration.'' The novelty of our algorithm design draws on insights from offline RL: the exploration scheme attempts to maximize a critical reward-agnostic quantity that dictates the performance of offline RL, while the policy learning paradigm leverages ideas from sample-optimal offline RL paradigms.

artificial intelligence, machine learning, reinforcement learning, (13 more...)

2304.07278

Country:

North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > New Jersey > Mercer County > Princeton (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.48)

Ferraro, Stefano, Van de Maele, Toon, Verbelen, Tim, Dhoedt, Bart

Symmetry and Complexity in Object-Centric Deep Active Inference Models

Humans perceive and interact with hundreds of objects every day. In doing so, they need to employ mental models of these objects and often exploit symmetries in the object's shape and appearance in order to learn generalizable and transferable skills. Active inference is a first principles approach to understanding and modeling sentient agents. It states that agents entertain a generative model of their environment, and learn and act by minimizing an upper bound on their surprisal, i.e. their Free Energy. The Free Energy decomposes into an accuracy and complexity term, meaning that agents favor the least complex model, that can accurately explain their sensory observations. In this paper, we investigate how inherent symmetries of particular objects also emerge as symmetries in the latent state space of the generative model learnt under deep active inference. In particular, we focus on object-centric representations, which are trained from pixels to predict novel object views as the agent moves its viewpoint. First, we investigate the relation between model complexity and symmetry exploitation in the state space. Second, we do a principal component analysis to demonstrate how the model encodes the principal axis of symmetry of the object in the latent space. Finally, we also demonstrate how more symmetrical representations can be exploited for better generalization in the context of manipulation.

artificial intelligence, machine learning, symmetry, (18 more...)

doi: 10.1098/rsfs.2022.0077

2304.14493

Country:

Europe > France (0.04)
Europe > Belgium > Flanders > East Flanders > Ghent (0.04)
Asia > Japan > Honshū > Tōhoku > Fukushima Prefecture > Fukushima (0.04)

Genre: Research Report (0.50)

Industry: Health & Medicine > Therapeutic Area > Neurology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)
(2 more...)

Cooperative guidance of multiple missiles: a hybrid co-evolutionary approach

Lan, Xuejing, Chen, Junda, Zhao, Zhijia, Zou, Tao

Cooperative guidance of multiple missiles is a challenging task with rigorous constraints of time and space consensus, especially when attacking dynamic targets. In this paper, the cooperative guidance task is described as a distributed multi-objective cooperative optimization problem. To address the issues of non-stationarity and continuous control faced by cooperative guidance, the natural evolutionary strategy (NES) is improved along with an elitist adaptive learning technique to develop a novel natural co-evolutionary strategy (NCES). The gradients of original evolutionary strategy are rescaled to reduce the estimation bias caused by the interaction between the multiple missiles. Then, a hybrid co-evolutionary cooperative guidance law (HCCGL) is proposed by integrating the highly scalable co-evolutionary mechanism and the traditional guidance strategy. Finally, three simulations under different conditions demonstrate the effectiveness and superiority of this guidance law in solving cooperative guidance tasks with high accuracy. The proposed co-evolutionary approach has great prospects not only in cooperative guidance, but also in other application scenarios of multi-objective optimization, dynamic optimization and distributed control.

artificial intelligence, evolutionary algorithm, machine learning, (17 more...)

2208.07156

Country: Asia > China > Guangdong Province > Guangzhou (0.04)

Genre: Research Report (0.64)

Industry: Government > Military (0.93)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)

arXiv.org Artificial IntelligenceApr-13-2023

Online Recognition of Incomplete Gesture Data to Interface Collaborative Robots

Simão, M. A., Gibaru, O., Neto, P.

Online recognition of gestures is critical for intuitive human-robot interaction (HRI) and further push collaborative robotics into the market, making robots accessible to more people. The problem is that it is difficult to achieve accurate gesture recognition in real unstructured environments, often using distorted and incomplete multisensory data. This paper introduces an HRI framework to classify large vocabularies of interwoven static gestures (SGs) and dynamic gestures (DGs) captured with wearable sensors. DG features are obtained by applying data dimensionality reduction to raw data from sensors (resampling with cubic interpolation and principal component analysis). Experimental tests were conducted using the UC2017 hand gesture dataset with samples from eight different subjects. The classification models show an accuracy of 95.6% for a library of 24 SGs with a random forest and 99.3% for 10 DGs using artificial neural networks. These results compare equally or favorably with different commonly used classifiers. Long short-term memory deep networks achieved similar performance in online frame-by-frame classification using raw incomplete data, performing better in terms of accuracy than static models with specially crafted features, but worse in training and inference time. The recognized gestures are used to teleoperate a robot in a collaborative process that consists in preparing a breakfast meal.

accuracy, artificial intelligence, machine learning, (19 more...)

doi: 10.1109/TIE.2019.2891449

2304.06777

Country:

North America > United States > California > San Francisco County > San Francisco (0.14)
Europe > Portugal > Coimbra > Coimbra (0.04)
North America > Montserrat (0.04)
(4 more...)

Genre: Research Report (1.00)

Industry: Education (0.68)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.46)