Goto

Collaborating Authors

 Undirected Networks


From Tensor Network Quantum States to Tensorial Recurrent Neural Networks

arXiv.org Machine Learning

Considering the relation between neural networks (NN) and TN, the first works focused on the restricted Boltzmann machines (RBM), which are one of the simplest Tensor networks (TN) have been extensively used to classes of NN. It is impossible to efficiently map an represent the states of quantum many-body physical systems RBM onto a TN, as they correspond to string-bond states [1-3]. Matrix product states (MPS) are possibly with an arbitrary nonlocal geometry [28]. This result was the simplest family of TN, and are suitable to capture later refined to show that an RBM may correspond to an the ground state of 1D gapped Hamiltonians [4, 5]. They MPS with an exponentially large bond dimension, and can be contracted in polynomial time to compute physical only short-range RBM can be mapped onto efficiently quantities exactly, and optimized by density matrix computable entangled plaquette states [31]. Similar results renormalization group (DMRG) [6] when used as variational have been obtained that deep Boltzmann machines ansätze. More powerful TN architectures that with proper constraints can be mapped onto TN that cannot be efficiently contracted in general have been are efficiently computable through transfer matrix methods proposed later, notably projected entangled pair states [32].


Probabilistic Planning with Partially Ordered Preferences over Temporal Goals

arXiv.org Artificial Intelligence

In this paper, we study planning in stochastic systems, modeled as Markov decision processes (MDPs), with preferences over temporally extended goals. Prior work on temporal planning with preferences assumes that the user preferences form a total order, meaning that every pair of outcomes are comparable with each other. In this work, we consider the case where the preferences over possible outcomes are a partial order rather than a total order. We first introduce a variant of deterministic finite automaton, referred to as a preference DFA, for specifying the user's preferences over temporally extended goals. Based on the order theory, we translate the preference DFA to a preference relation over policies for probabilistic planning in a labeled MDP. In this treatment, a most preferred policy induces a weak-stochastic nondominated probability distribution over the finite paths in the MDP. The proposed planning algorithm hinges on the construction of a multi-objective MDP. We prove that a weak-stochastic nondominated policy given the preference specification is Pareto-optimal in the constructed multi-objective MDP, and vice versa. Throughout the paper, we employ a running example to demonstrate the proposed preference specification and solution approaches. We show the efficacy of our algorithm using the example with detailed analysis, and then discuss possible future directions.


Resource-aware Probability-based Collaborative Odor Source Localization Using Multiple UAVs

arXiv.org Artificial Intelligence

Benefitting from UAVs' characteristics of flexible deployment and controllable movement in 3D space, odor source localization with multiple UAVs has been a hot research area in recent years. Considering the limited resources and insufficient battery capacities of UAVs, it is necessary to fast locate the odor source with low-complexity computation and minimal interaction under complicated environmental states. To this end, we propose a multi-UAV collaboration based odor source localization (\textit{MUC-OSL}) method, where source estimation and UAV navigation are iteratively performed, aiming to accelerate the searching process and reduce the resource consumption of UAVs. Specifically, in the source estimation phase, we present a collaborative particle filter algorithm on the basis of UAVs' cognitive difference and Gaussian fitting to improve source estimation accuracy. In the following navigation phase, an adaptive path planning algorithm is designed based on Partially Observable Markov Decision Process (POMDP) to distributedly determine the subsequent flying direction and moving steps of each UAV. The results of experiments conducted on two simulation platforms demonstrate that \textit{MUC-OSL} outperforms existing efforts in terms of mean search time and success rate, and effectively reduces the resource consumption of UAVs.


Integrated Task and Motion Planning for Safe Legged Navigation in Partially Observable Environments

arXiv.org Artificial Intelligence

This study proposes a hierarchically integrated framework for safe task and motion planning (TAMP) of bipedal locomotion in a partially observable environment with dynamic obstacles and uneven terrain. The high-level task planner employs linear temporal logic (LTL) for a reactive game synthesis between the robot and its environment and provides a formal guarantee on navigation safety and task completion. To address environmental partial observability, a belief abstraction is employed at the high-level navigation planner to estimate the dynamic obstacles' location. Accordingly, a synthesized action planner sends a set of locomotion actions to the middle-level motion planner, while incorporating safe locomotion specifications extracted from safety theorems based on a reduced-order model (ROM) of the locomotion process. The motion planner employs the ROM to design safety criteria and a sampling algorithm to generate non-periodic motion plans that accurately track high-level actions. At the low level, a foot placement controller based on an angular-momentum linear inverted pendulum model is implemented and integrated with an ankle-actuated passivity-based controller for full-body trajectory tracking. To address external perturbations, this study also investigates safe sequential composition of the keyframe locomotion state and achieves robust transitions against external perturbations through reachability analysis. The overall TAMP framework is validated with extensive simulations and hardware experiments on bipedal walking robots Cassie and Digit designed by Agility Robotics.


Stabilized training of joint energy-based models and their practical applications

arXiv.org Artificial Intelligence

The recently proposed Joint Energy-based Model (JEM) interprets discriminatively trained classifier $p(y|x)$ as an energy model, which is also trained as a generative model describing the distribution of the input observations $p(x)$. The JEM training relies on "positive examples" (i.e. examples from the training data set) as well as on "negative examples", which are samples from the modeled distribution $p(x)$ generated by means of Stochastic Gradient Langevin Dynamics (SGLD). Unfortunately, SGLD often fails to deliver negative samples of sufficient quality during the standard JEM training, which causes a very unbalanced contribution from the positive and negative examples when calculating gradients for JEM updates. As a consequence, the standard JEM training is quite unstable requiring careful tuning of hyper-parameters and frequent restarts when the training starts diverging. This makes it difficult to apply JEM to different neural network architectures, modalities, and tasks. In this work, we propose a training procedure that stabilizes SGLD-based JEM training (ST-JEM) by balancing the contribution from the positive and negative examples. We also propose to add an additional "regularization" term to the training objective -- MI between the input observations $x$ and output labels $y$ -- which encourages the JEM classifier to make more certain decisions about output labels. We demonstrate the effectiveness of our approach on the CIFAR10 and CIFAR100 tasks. We also consider the task of classifying phonemes in a speech signal, for which we were not able to train JEM without the proposed stabilization. We show that a convincing speech can be generated from the trained model. Alternatively, corrupted speech can be de-noised by bringing it closer to the modeled speech distribution using a few SGLD iterations. We also propose and discuss additional applications of the trained model.


Feeling Optimistic? Ambiguity Attitudes for Online Decision Making

arXiv.org Artificial Intelligence

As autonomous agents enter complex environments, it becomes more difficult to adequately model the interactions between the two. Agents must therefore cope with greater ambiguity (e.g., unknown environments, underdefined models, and vague problem definitions). Despite the consequences of ignoring ambiguity, tools for decision making under ambiguity are understudied. The general approach has been to avoid ambiguity (exploit known information) using robust methods. This work contributes ambiguity attitude graph search (AAGS), generalizing robust methods with ambiguity attitudes--the ability to trade-off between seeking and avoiding ambiguity in the problem. AAGS solves online decision making problems with limited budget to learn about their environment. To evaluate this approach AAGS is tasked with path planning in static and dynamic environments. Results demonstrate that appropriate ambiguity attitudes are dependent on the quality of information from the environment. In relatively certain environments, AAGS can readily exploit information with robust policies. Conversely, model complexity reduces the information conveyed by individual samples; this allows the risks taken by optimistic policies to achieve better performance.


Proactive Multi-Camera Collaboration For 3D Human Pose Estimation

arXiv.org Artificial Intelligence

This paper presents a multi-agent reinforcement learning (MARL) scheme for proactive Multi-Camera Collaboration in 3D Human Pose Estimation in dynamic human crowds. Traditional fixed-viewpoint multi-camera solutions for human motion capture (MoCap) are limited in capture space and susceptible to dynamic occlusions. Active camera approaches proactively control camera poses to find optimal viewpoints for 3D reconstruction. However, current methods still face challenges with credit assignment and environment dynamics. To address these issues, our proposed method introduces a novel Collaborative Triangulation Contribution Reward (CTCR) that improves convergence and alleviates multi-agent credit assignment issues resulting from using 3D reconstruction accuracy as the shared reward. Additionally, we jointly train our model with multiple world dynamics learning tasks to better capture environment dynamics and encourage anticipatory behaviors for occlusion avoidance. We evaluate our proposed method in four photo-realistic UE4 environments to ensure validity and generalizability. Empirical results show that our method outperforms fixed and active baselines in various scenarios with different numbers of cameras and humans. Figure 1: Left: Two critical challenges in fixed camera approaches. Right: Three active cameras collaborate to best reconstruct the 3D pose of the target (marked in). Marker-less motion capture (MoCap) has broad applications in many areas such as cinematography, medical research, virtual reality (VR), sports, and etc. Their successes can be partly attributed to recent developments in 3D Human pose estimation (HPE) techniques (Tu et al., 2020; Iskakov et al., 2019; Jafarian et al., 2019; Pavlakos et al., 2017b; Lin & Lee, 2021b). A straightforward implementation to solve multi-views 3D HPE is to use fixed cameras. Although being a convenient solution, it is less effective against dynamic occlusions. Moreover, fixed camera solutions confine tracking targets within a constrained space, therefore less applicable to outdoor MoCap. On the contrary, active cameras (Luo et al., 2018; 2019; Zhong et al., 2018a; 2019) such as ones mounted on drones can maneuver proactively against incoming occlusions. Owing to its remarkable flexibility, the active approach has thus attracted overwhelming interest (Tallamraju et al., 2020; Ho et al., 2021; Xu et al., 2017; Kiciroglu et al., 2019; Saini et al., 2022; Cheng et al., 2018; Zhang et al., 2021).


Mastering Strategy Card Game (Legends of Code and Magic) via End-to-End Policy and Optimistic Smooth Fictitious Play

arXiv.org Artificial Intelligence

Deep Reinforcement Learning combined with Fictitious Play shows impressive results on many benchmark games, most of which are, however, single-stage. In contrast, real-world decision making problems may consist of multiple stages, where the observation spaces and the action spaces can be completely different across stages. We study a two-stage strategy card game Legends of Code and Magic and propose an end-to-end policy to address the difficulties that arise in multi-stage game. We also propose an optimistic smooth fictitious play algorithm to find the Nash Equilibrium for the two-player game. Our approach wins double championships of COG2022 competition. Extensive studies verify and show the advancement of our approach.


On the Sample Complexity of Vanilla Model-Based Offline Reinforcement Learning with Dependent Samples

arXiv.org Artificial Intelligence

Offline reinforcement learning (offline RL) considers problems where learning is performed using only previously collected samples and is helpful for the settings in which collecting new data is costly or risky. In model-based offline RL, the learner performs estimation (or optimization) using a model constructed according to the empirical transition frequencies. We analyze the sample complexity of vanilla model-based offline RL with dependent samples in the infinite-horizon discounted-reward setting. In our setting, the samples obey the dynamics of the Markov decision process and, consequently, may have interdependencies. Under no assumption of independent samples, we provide a high-probability, polynomial sample complexity bound for vanilla model-based off-policy evaluation that requires partial or uniform coverage. We extend this result to the off-policy optimization under uniform coverage. As a comparison to the model-based approach, we analyze the sample complexity of off-policy evaluation with vanilla importance sampling in the infinite-horizon setting. Finally, we provide an estimator that outperforms the sample-mean estimator for almost deterministic dynamics that are prevalent in reinforcement learning.


Unsupervised Active Visual Search with Monte Carlo planning under Uncertain Detections

arXiv.org Artificial Intelligence

We propose a solution for Active Visual Search of objects in an environment, whose 2D floor map is the only known information. Our solution has three key features that make it more plausible and robust to detector failures compared to state-of-the-art methods: (i) it is unsupervised as it does not need any training sessions. (ii) During the exploration, a probability distribution on the 2D floor map is updated according to an intuitive mechanism, while an improved belief update increases the effectiveness of the agent's exploration. (iii) We incorporate the awareness that an object detector may fail into the aforementioned probability modelling by exploiting the success statistics of a specific detector. Our solution is dubbed POMP-BE-PD (Pomcp-based Online Motion Planning with Belief by Exploration and Probabilistic Detection). It uses the current pose of an agent and an RGB-D observation to learn an optimal search policy, exploiting a POMDP solved by a Monte-Carlo planning approach. On the Active Vision Database benchmark, we increase the average success rate over all the environments by a significant 35% while decreasing the average path length by 4% with respect to competing methods. Thus, our results are state-of-the-art, even without using any training procedure.