AITopics | Duan, Xiaoming

Plotting

Duan, Xiaoming

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Analysis of On-policy Policy Gradient Methods under the Distribution Mismatch

Wang, Weizhen, He, Jianping, Duan, Xiaoming

arXiv.org Artificial IntelligenceMar-28-2025

Policy gradient methods are one of the most successful methods for solving challenging reinforcement learning problems. However, despite their empirical successes, many SOTA policy gradient algorithms for discounted problems deviate from the theoretical policy gradient theorem due to the existence of a distribution mismatch. In this work, we analyze the impact of this mismatch on the policy gradient methods. Specifically, we first show that in the case of tabular parameterizations, the methods under the mismatch remain globally optimal. Then, we extend this analysis to more general parameterizations by leveraging the theory of biased stochastic gradient descent. Our findings offer new insights into the robustness of policy gradient methods as well as the gap between theoretical foundations and practical implementations.

artificial intelligence, gradient, machine learning, (14 more...)

arXiv.org Artificial Intelligence

2503.22244

Country: Asia (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.86)

Add feedback

Inverse Reinforcement Learning with Unknown Reward Model based on Structural Risk Minimization

Qu, Chendi, He, Jianping, Duan, Xiaoming, Chen, Jiming

arXiv.org Artificial IntelligenceDec-27-2023

Inverse reinforcement learning (IRL) usually assumes the model of the reward function is pre-specified and estimates the parameter only. However, how to determine a proper reward model is nontrivial. A simplistic model is less likely to contain the real reward function, while a model with high complexity leads to substantial computation cost and risks overfitting. This paper addresses this trade-off in IRL model selection by introducing the structural risk minimization (SRM) method from statistical learning. SRM selects an optimal reward function class from a hypothesis set minimizing both estimation error and model complexity. To formulate an SRM scheme for IRL, we estimate policy gradient by demonstration serving as empirical risk and establish the upper bound of Rademacher complexity of hypothesis classes as model penalty. The learning guarantee is further presented. In particular, we provide explicit SRM for the common linear weighted sum setting in IRL. Simulations demonstrate the performance and efficiency of our scheme.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2312.16566

Country: Asia > China (0.28)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Multiplayer Homicidal Chauffeur Reach-Avoid Games: A Pursuit Enclosure Function Approach

Yan, Rui, Duan, Xiaoming, Zou, Rui, He, Xin, Shi, Zongying, Bullo, Francesco

arXiv.org Artificial IntelligenceDec-22-2023

This paper presents a multiplayer Homicidal Chauffeur reach-avoid differential game, which involves Dubins-car pursuers and simple-motion evaders. The goal of the pursuers is to cooperatively protect a planar convex region from the evaders, who strive to reach the region. We propose a cooperative strategy for the pursuers based on subgames for multiple pursuers against one evader and optimal task allocation. We introduce pursuit enclosure functions (PEFs) and propose a new enclosure region pursuit (ERP) winning approach that supports forward analysis for the strategy synthesis in the subgames. We show that if a pursuit coalition is able to defend the region against an evader under the ERP winning, then no more than two pursuers in the coalition are necessarily needed. We also propose a steer-to-ERP approach to certify the ERP winning and synthesize the ERP winning strategy. To implement the strategy, we introduce a positional PEF and provide the necessary parameters, states, and strategies that ensure the ERP winning for both one pursuer and two pursuers against one evader. Additionally, we formulate a binary integer program using the subgame outcomes to maximize the captured evaders in the ERP winning for the pursuit task allocation. Finally, we propose a multiplayer receding-horizon strategy where the ERP winnings are checked in each horizon, the task is allocated, and the strategies of the pursuers are determined. Numerical examples are provided to illustrate the results.

artificial intelligence, erp, evader, (17 more...)

arXiv.org Artificial Intelligence

2311.02389

Country:

Asia > China (0.28)
Europe (0.28)
North America > United States > California (0.14)

Genre: Research Report (0.40)

Industry:

Leisure & Entertainment > Games (0.88)
Transportation > Passenger (0.61)
Transportation > Ground > Road (0.61)

Technology: Information Technology > Artificial Intelligence > Robots (1.00)

Add feedback

Affordance-Driven Next-Best-View Planning for Robotic Grasping

Zhang, Xuechao, Wang, Dong, Han, Sun, Li, Weichuang, Zhao, Bin, Wang, Zhigang, Duan, Xiaoming, Fang, Chongrong, Li, Xuelong, He, Jianping

arXiv.org Artificial IntelligenceNov-3-2023

Grasping occluded objects in cluttered environments is an essential component in complex robotic manipulation tasks. In this paper, we introduce an AffordanCE-driven Next-Best-View planning policy (ACE-NBV) that tries to find a feasible grasp for target object via continuously observing scenes from new viewpoints. This policy is motivated by the observation that the grasp affordances of an occluded object can be better-measured under the view when the view-direction are the same as the grasp view. Specifically, our method leverages the paradigm of novel view imagery to predict the grasps affordances under previously unobserved view, and select next observation view based on the highest imagined grasp quality of the target object. The experimental results in simulation and on a real robot demonstrate the effectiveness of the proposed affordance-driven next-best-view planning policy. Project page: https://sszxc.net/ace-nbv/.

artificial intelligence, grasp affordance, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2309.09556

Country: Asia > China (0.14)

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)
Information Technology > Artificial Intelligence > Robots > Manipulation (0.46)

Add feedback

HiCRISP: A Hierarchical Closed-Loop Robotic Intelligent Self-Correction Planner

Ming, Chenlin, Lin, Jiacheng, Fong, Pangkit, Wang, Han, Duan, Xiaoming, He, Jianping

arXiv.org Artificial IntelligenceSep-21-2023

Abstract-- The integration of Large Language Models (LLMs) into robotics has revolutionized human-robot interactions and autonomous task planning. However, these systems are often unable to self-correct during the task execution, which hinders their adaptability in dynamic real-world environments. To address this issue, we present a Hierarchical Closed-loop Robotic Intelligent Self-correction Planner (HiCRISP), an innovative framework that enables robots to correct errors within individual steps during the task execution. HiCRISP actively monitors and adapts the task execution process, addressing both high-level planning and low-level action errors. This enhancement has the potential to propel smart [4], and logical reasoning [5], [6].

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2309.12089

Country: North America > United States > Illinois > Champaign County (0.14)

Genre: Research Report (0.82)

Industry: Energy > Renewable > Geothermal > Geothermal Energy Systems and Facilities > Geothermal System for Power Generation > Advanced Geothermal System (AGS) (0.61)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)

Add feedback

Learning-Based Motion Planning with Mixture Density Networks

Wang, Yinghan, Duan, Xiaoming, He, Jianping

arXiv.org Artificial IntelligenceSep-20-2023

The trade-off between computation time and path optimality is a key consideration in motion planning algorithms. While classical sampling based algorithms fall short of computational efficiency in high dimensional planning, learning based methods have shown great potential in achieving time efficient and optimal motion planning. The SOTA learning based motion planning algorithms utilize paths generated by sampling based methods as expert supervision data and train networks via regression techniques. However, these methods often overlook the important multimodal property of the optimal paths in the training set, making them incapable of finding good paths in some scenarios. In this paper, we propose a Multimodal Neuron Planner (MNP) based on the mixture density networks that explicitly takes into account the multimodality of the training data and simultaneously achieves time efficiency and path optimality. For environments represented by a point cloud, MNP first efficiently compresses the point cloud into a latent vector by encoding networks that are suitable for processing point clouds. We then design multimodal planning networks which enables MNP to learn and predict multiple optimal solutions. Simulation results show that our method outperforms SOTA learning based method MPNet and advanced sampling based methods IRRT* and BIT*.

artificial intelligence, configuration, machine learning, (16 more...)

arXiv.org Artificial Intelligence

2206.03292

Country: Asia > China (0.14)

Genre: Research Report (0.70)

Technology:

Information Technology > Artificial Intelligence > Robots > Robot Planning & Action (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.47)

Add feedback

Control Input Inference of Mobile Agents under Unknown Objective

Qu, Chendi, He, Jianping, Duan, Xiaoming, Wu, Shukun

arXiv.org Artificial IntelligenceJul-20-2023

Trajectory and control secrecy is an important issue in robotics security. This paper proposes a novel algorithm for the control input inference of a mobile agent without knowing its control objective. Specifically, the algorithm first estimates the target state by applying external perturbations. Then we identify the objective function based on the inverse optimal control, providing the well-posedness proof and the identifiability analysis. Next, we obtain the optimal estimate of the control horizon using binary search. Finally, the agent's control optimization problem is reconstructed and solved to predict its input. Simulation illustrates the efficiency and the performance of the algorithm.

artificial intelligence, machine learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2307.10883

Country: Asia > China (0.14)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.72)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback

Reinforcement Learning with Temporal-Logic-Based Causal Diagrams

Paliwal, Yash, Roy, Rajarshi, Gaglione, Jean-Raphaël, Baharisangari, Nasim, Neider, Daniel, Duan, Xiaoming, Topcu, Ufuk, Xu, Zhe

arXiv.org Artificial IntelligenceJun-23-2023

We study a class of reinforcement learning (RL) tasks where the objective of the agent is to accomplish temporally extended goals. In this setting, a common approach is to represent the tasks as deterministic finite automata (DFA) and integrate them into the state-space for RL algorithms. However, while these machines model the reward function, they often overlook the causal knowledge about the environment. To address this limitation, we propose the Temporal-Logic-based Causal Diagram (TL-CD) in RL, which captures the temporal causal relationships between different properties of the environment. We exploit the TL-CD to devise an RL algorithm in which an agent requires significantly less exploration of the environment. To this end, based on a TL-CD and a task DFA, we identify configurations where the agent can determine the expected rewards early during an exploration. Through a series of case studies, we demonstrate the benefits of using TL-CDs, particularly the faster convergence of the algorithm to an optimal policy due to reduced exploration of the environment.

artificial intelligence, machine learning, reinforcement learning, (14 more...)

arXiv.org Artificial Intelligence

2306.13732

Country: North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Revisiting Estimation Bias in Policy Gradients for Deep Reinforcement Learning

Pan, Haoxuan, Ye, Deheng, Duan, Xiaoming, Fu, Qiang, Yang, Wei, He, Jianping, Sun, Mingfei

arXiv.org Artificial IntelligenceFeb-10-2023

We revisit the estimation bias in policy gradients for the discounted episodic Markov decision process (MDP) from Deep Reinforcement Learning (DRL) perspective. The objective is formulated theoretically as the expected returns discounted over the time horizon. One of the major policy gradient biases is the state distribution shift: the state distribution used to estimate the gradients differs from the theoretical formulation in that it does not take into account the discount factor. Existing discussion of the influence of this bias was limited to the tabular and softmax cases in the literature. Therefore, in this paper, we extend it to the DRL setting where the policy is parameterized and demonstrate how this bias can lead to suboptimal policies theoretically. We then discuss why the empirically inaccurate implementations with shifted state distribution can still be effective. We show that, despite such state distribution shift, the policy gradient estimation bias can be reduced in the following three ways: 1) a small learning rate; 2) an adaptive-learning-rate-based optimizer; and 3) KL regularization. Specifically, we show that a smaller learning rate, or, an adaptive learning rate, such as that used by Adam and RSMProp optimizers, makes the policy optimization robust to the bias. We further draw connections between optimizers and the optimization regularization to show that both the KL and the reverse KL regularization can significantly rectify this bias. Moreover, we provide extensive experiments on continuous control tasks to support our analysis. Our paper sheds light on how successful PG algorithms optimize policies in the DRL setting, and contributes insights into the practical issues in DRL.

artificial intelligence, machine learning, reinforcement learning, (17 more...)

arXiv.org Artificial Intelligence

2301.08442

Country: North America > United States (0.14)

Genre: Research Report (0.82)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (0.34)

Add feedback

Robust Pandemic Control Synthesis with Formal Specifications: A Case Study on COVID-19 Pandemic

Xu, Zhe, Duan, Xiaoming

arXiv.org Artificial IntelligenceMar-26-2021

Pandemics can bring a range of devastating consequences to public health and the world economy. Identifying the most effective control strategies has been the imperative task all around the world. Various public health control strategies have been proposed and tested against pandemic diseases (e.g., COVID-19). We study two specific pandemic control models: the susceptible, exposed, infectious, recovered (SEIR) model with vaccination control; and the SEIR model with shield immunity control. We express the pandemic control requirement in metric temporal logic (MTL) formulas. We then develop an iterative approach for synthesizing the optimal control strategies with MTL specifications. We provide simulation results in two different scenarios for robust control of the COVID-19 pandemic: one for vaccination control, and another for shield immunity control, with the model parameters estimated from data in Lombardy, Italy. The results show that the proposed synthesis approach can generate control inputs such that the time-varying numbers of individuals in each category (e.g., infectious, immune) satisfy the MTL specifications with robustness against initial state and parameter uncertainties.

immunology, optimization problem, specification, (16 more...)

arXiv.org Artificial Intelligence

2103.14262

Country:

Europe > Italy > Lombardy (0.25)
North America > United States > Texas > Travis County > Austin (0.14)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (1.00)
Health & Medicine > Therapeutic Area > Immunology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)

Add feedback