AITopics | Peng, Zhenghao

Plotting

Peng, Zhenghao

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Human-AI Shared Control via Policy Dissection

Li, Quanyi, Peng, Zhenghao, Wu, Haibin, Feng, Lan, Zhou, Bolei

arXiv.org Artificial IntelligenceMar-2-2023

Human-AI shared control allows human to interact and collaborate with autonomous agents to accomplish control tasks in complex environments. Previous Reinforcement Learning (RL) methods attempted goal-conditioned designs to achieve human-controllable policies at the cost of redesigning the reward function and training paradigm. Inspired by the neuroscience approach to investigate the motor cortex in primates, we develop a simple yet effective frequency-based approach called Policy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior. Without modifying the neural controller or retraining the model, the proposed approach can convert a given RL-trained policy into a human-controllable policy. We evaluate the proposed approach on many RL tasks such as autonomous driving and locomotion. The experiments show that human-AI shared control system achieved by Policy Dissection in driving task can substantially improve the performance and safety in unseen traffic scenes. With human in the inference loop, the locomotion robots also exhibit versatile controllable motion skills even though they are only trained to move forward. Our results suggest the promising direction of implementing human-AI shared autonomy through interpreting the learned representation of the autonomous agents. Code and demo videos are available at https://metadriverse.github.io/policydissect.

artificial intelligence, machine learning, reinforcement learning, (15 more...)

arXiv.org Artificial Intelligence

2206.00152

Country: North America > United States > California (0.28)

Genre: Research Report > New Finding (1.00)

Industry:

Transportation (0.89)
Information Technology (0.88)
Health & Medicine > Therapeutic Area > Neurology (0.87)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining

Zhang, Qihang, Peng, Zhenghao, Zhou, Bolei

arXiv.org Artificial IntelligenceJul-17-2022

Deep visuomotor policy learning, which aims to map raw visual observation to action, achieves promising results in control tasks such as robotic manipulation and autonomous driving. However, it requires a huge number of online interactions with the training environment, which limits its real-world application. Compared to the popular unsupervised feature learning for visual recognition, feature pretraining for visuomotor control tasks is much less explored. In this work, we aim to pretrain policy representations for driving tasks by watching hours-long uncurated YouTube videos. Specifically, we train an inverse dynamic model with a small amount of labeled data and use it to predict action labels for all the YouTube video frames. A new contrastive policy pretraining method is then developed to learn action-conditioned features from the video frames with pseudo action labels. Experiments show that the resulting action-conditioned features obtain substantial improvements for the downstream reinforcement learning and imitation learning tasks, outperforming the weights pretrained from previous unsupervised learning methods and ImageNet pretrained weight. Code, model weights, and data are available at: https://metadriverse.github.io/ACO.

artificial intelligence, deep learning, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2204.02393

Country:

Asia (0.28)
North America > United States (0.28)
Europe (0.28)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (0.49)
Education (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.34)

Add feedback

Safe Driving via Expert Guided Policy Optimization

Peng, Zhenghao, Li, Quanyi, Liu, Chunxiao, Zhou, Bolei

arXiv.org Artificial IntelligenceOct-13-2021

When learning common skills like driving, beginners usually have domain experts standing by to ensure the safety of the learning process. We formulate such learning scheme under the Expert-in-the-loop Reinforcement Learning where a guardian is introduced to safeguard the exploration of the learning agent. While allowing the sufficient exploration in the uncertain environment, the guardian intervenes under dangerous situations and demonstrates the correct actions to avoid potential accidents. Thus ERL enables both exploration and expert's partial demonstration as two training sources. Following such a setting, we develop a novel Expert Guided Policy Optimization (EGPO) method which integrates the guardian in the loop of reinforcement learning. The guardian is composed of an expert policy to generate demonstration and a switch function to decide when to intervene. Particularly, a constrained optimization technique is used to tackle the trivial solution that the agent deliberately behaves dangerously to deceive the expert into taking over. Offline RL technique is further used to learn from the partial demonstration generated by the expert. Safe driving experiments show that our method achieves superior training and test-time safety, outperforms baselines with a substantial margin in sample efficiency, and preserves the generalizabiliy to unseen environments in test-time. Demo video and source code are available at: https://decisionforce.github.io/EGPO/

artificial intelligence, machine learning, reinforcement learning, (16 more...)

arXiv.org Artificial Intelligence

2110.06831

Country: Asia (0.14)

Genre: Research Report > New Finding (0.46)

Industry: Education (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)

Add feedback

Non-local Policy Optimization via Diversity-regularized Collaborative Exploration

Peng, Zhenghao, Sun, Hao, Zhou, Bolei

arXiv.org Machine LearningJun-13-2020

Conventional Reinforcement Learning (RL) algorithms usually have one single agent learning to solve the task independently. As a result, the agent can only explore a limited part of the state-action space while the learned behavior is highly correlated to the agent's previous experience, making the training prone to a local minimum. In this work, we empower RL with the capability of teamwork and propose a novel non-local policy optimization framework called Diversity-regularized Collaborative Exploration (DiCE). DiCE utilizes a group of heterogeneous agents to explore the environment simultaneously and share the collected experiences. A regularization mechanism is further designed to maintain the diversity of the team and modulate the exploration. We implement the framework in both on-policy and off-policy settings and the experimental results show that DiCE can achieve substantial improvement over the baselines in the MuJoCo locomotion tasks.

agent, artificial intelligence, optimization problem, (16 more...)

arXiv.org Machine Learning

2006.07781

Genre: Research Report > New Finding (0.34)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Novel Policy Seeking with Constrained Optimization

Sun, Hao, Peng, Zhenghao, Dai, Bo, Guo, Jian, Lin, Dahua, Zhou, Bolei

arXiv.org Artificial IntelligenceMay-21-2020

In this work, we address the problem of learning to seek novel policies in reinforcement learning tasks. Instead of following the multi-objective framework used in previous methods, we propose to rethink the problem under a novel perspective of constrained optimization. We first introduce a new metric to evaluate the difference between policies, and then design two practical novel policy seeking methods following the new perspective, namely the Constrained Task Novel Bisector (CTNB), and the Interior Policy Differentiation (IPD), corresponding to the feasible direction method and the interior point method commonly known in constrained optimization problems. Experimental comparisons on the MuJuCo control suite show our methods achieve substantial improvements over previous novelty-seeking methods in terms of both novelty and primal task performance.

artificial intelligence, novel policy, optimization problem, (18 more...)

arXiv.org Artificial Intelligence

2005.10696

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

AXNet: ApproXimate computing using an end-to-end trainable neural network

Peng, Zhenghao, Chen, Xuyang, Xu, Chengwen, Jing, Naifeng, Liang, Xiaoyao, Lu, Cewu, Jiang, Li

arXiv.org Machine LearningJul-27-2018

Neural network based approximate computing is a universal architecture promising to gain tremendous energy-efficiency for many error resilient applications. To guarantee the approximation quality, existing works deploy two neural networks (NNs), e.g., an approximator and a predictor. The approximator provides the approximate results, while the predictor predicts whether the input data is safe to approximate with the given quality requirement. However, it is non-trivial and time-consuming to make these two neural network coordinate---they have different optimization objectives---by training them separately. This paper proposes a novel neural network structure---AXNet---to fuse two NNs to a holistic end-to-end trainable NN. Leveraging the philosophy of multi-task learning, AXNet can tremendously improve the invocation (proportion of safe-to-approximate samples) and reduce the approximation error. The training effort also decrease significantly. Experiment results show 50.7% more invocation and substantial cuts of training time when compared to existing neural network based approximate computing framework.

artificial intelligence, axnet, neural network, (17 more...)

arXiv.org Machine Learning

1807.10458

Genre: Research Report > New Finding (0.48)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

Approximate Random Dropout

Song, Zhuoran, Ru, Dongyu, Wang, Ru, Huang, Hongru, Peng, Zhenghao, Ke, Jing, Liang, Xiaoyao, Jiang, Li

arXiv.org Machine LearningMay-22-2018

The training phases of Deep neural network (DNN) consume enormous processing time and energy. Compression techniques for inference acceleration leveraging the sparsity of DNNs, however, can be hardly used in the training phase. Because the training involves dense matrix-multiplication using GPGPU, which endorse regular and structural data layout. In this paper, we exploit the sparsity of DNN resulting from the random dropout technique to eliminate the unnecessary computation and data access for those dropped neurons or synapses in the training phase. Experiments results on MLP and LSTM on standard benchmarks show that the proposed Approximate Random Dropout can reduce the training time by half on average with ignorable accuracy loss.

deep learning, dropout pattern, neural network, (20 more...)

arXiv.org Machine Learning

1805.08939

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback