Chen, Jiayu
Option-Aware Adversarial Inverse Reinforcement Learning for Robotic Control
Chen, Jiayu, Lan, Tian, Aggarwal, Vaneet
Hierarchical Imitation Learning (HIL) has been proposed to recover highly-complex behaviors in long-horizon tasks from expert demonstrations by modeling the task hierarchy with the option framework. Existing methods either overlook the causal relationship between the subtask and its corresponding policy or cannot learn the policy in an end-to-end fashion, which leads to suboptimality. In this work, we develop a novel HIL algorithm based on Adversarial Inverse Reinforcement Learning and adapt it with the Expectation-Maximization algorithm in order to directly recover a hierarchical policy from the unannotated demonstrations. Further, we introduce a directed information term to the objective function to enhance the causality and propose a Variational Autoencoder framework for learning with our objectives in an end-to-end fashion. Theoretical justifications and evaluations on challenging robotic control tasks are provided to show the superiority of our algorithm. The codes are available at https://github.com/LucasCJYSDL/HierAIRL.
DeepFreight: Integrating Deep Reinforcement Learning and Mixed Integer Programming for Multi-transfer Truck Freight Delivery
Chen, Jiayu, Umrawal, Abhishek K., Lan, Tian, Aggarwal, Vaneet
With the freight delivery demands and shipping costs increasing rapidly, intelligent control of fleets to enable efficient and cost-conscious solutions becomes an important problem. In this paper, we propose DeepFreight, a model-free deep-reinforcement-learning-based algorithm for multi-transfer freight delivery, which includes two closely-collaborative components: truck-dispatch and package-matching. Specifically, a deep multi-agent reinforcement learning framework called QMIX is leveraged to learn a dispatch policy, with which we can obtain the multi-step joint vehicle dispatch decisions for the fleet with respect to the delivery requests. Then an efficient multi-transfer matching algorithm is executed to assign the delivery requests to the trucks. Also, DeepFreight is integrated with a Mixed-Integer Linear Programming optimizer for further optimization. The evaluation results show that the proposed system is highly scalable and ensures a 100\% delivery success while maintaining low delivery-time and fuel consumption. The codes are available at https://github.com/LucasCJYSDL/DeepFreight.
Asynchronous Multi-Agent Reinforcement Learning for Efficient Real-Time Multi-Robot Cooperative Exploration
Yu, Chao, Yang, Xinyi, Gao, Jiaxuan, Chen, Jiayu, Li, Yunfei, Liu, Jijia, Xiang, Yunfei, Huang, Ruixin, Yang, Huazhong, Wu, Yi, Wang, Yu
We consider the problem of cooperative exploration where multiple robots need to cooperatively explore an unknown region as fast as possible. Multi-agent reinforcement learning (MARL) has recently become a trending paradigm for solving this challenge. However, existing MARL-based methods adopt action-making steps as the metric for exploration efficiency by assuming all the agents are acting in a fully synchronous manner: i.e., every single agent produces an action simultaneously and every single action is executed instantaneously at each time step. Despite its mathematical simplicity, such a synchronous MARL formulation can be problematic for real-world robotic applications. It can be typical that different robots may take slightly different wall-clock times to accomplish an atomic action or even periodically get lost due to hardware issues. Simply waiting for every robot being ready for the next action can be particularly time-inefficient. Therefore, we propose an asynchronous MARL solution, Asynchronous Coordination Explorer (ACE), to tackle this real-world challenge. We first extend a classical MARL algorithm, multi-agent PPO (MAPPO), to the asynchronous setting and additionally apply action-delay randomization to enforce the learned policy to generalize better to varying action delays in the real world. Moreover, each navigation agent is represented as a team-size-invariant CNN-based policy, which greatly benefits real-robot deployment by handling possible robot lost and allows bandwidth-efficient intra-agent communication through low-dimensional CNN features. We first validate our approach in a grid-based scenario. Both simulation and real-robot results show that ACE reduces over 10% actual exploration time compared with classical approaches. We also apply our framework to a high-fidelity visual-based environment, Habitat, achieving 28% improvement in exploration efficiency.
Prediction of Gender from Longitudinal MRI data via Deep Learning on Adolescent Data Reveals Unique Patterns Associated with Brain Structure and Change over a Two-year Period
Bi, Yuda, Abrol, Anees, Fu, Zening, Chen, Jiayu, Liu, Jingyu, Calhoun, Vince
Deep learning algorithms for predicting neuroimaging data have shown considerable promise in various applications. Prior work has demonstrated that deep learning models that take advantage of the data's 3D structure can outperform standard machine learning on several learning tasks. However, most prior research in this area has focused on neuroimaging data from adults. Within the Adolescent Brain and Cognitive Development (ABCD) dataset, a large longitudinal development study, we examine structural MRI data to predict gender and identify gender-related changes in brain structure. Results demonstrate that gender prediction accuracy is exceptionally high (>97%) with training epochs >200 and that this accuracy increases with age. Brain regions identified as the most discriminative in the task under study include predominantly frontal areas and the temporal lobe. When evaluating gender predictive changes specific to a two-year increase in age, a broader set of visual, cingulate, and insular regions are revealed. Our findings show a robust gender-related structural brain change pattern, even over a small age range. This suggests that it might be possible to study how the brain changes during adolescence by looking at how these changes are related to different behavioral and environmental factors.
Multi-agent Covering Option Discovery based on Kronecker Product of Factor Graphs
Chen, Jiayu, Chen, Jingdi, Lan, Tian, Aggarwal, Vaneet
Covering option discovery has been developed to improve the exploration of reinforcement learning in single-agent scenarios with sparse reward signals, through connecting the most distant states in the embedding space provided by the Fiedler vector of the state transition graph. However, these option discovery methods cannot be directly extended to multi-agent scenarios, since the joint state space grows exponentially with the number of agents in the system. Thus, existing researches on adopting options in multi-agent scenarios still rely on single-agent option discovery and fail to directly discover the joint options that can improve the connectivity of the joint state space of agents. In this paper, we show that it is indeed possible to directly compute multi-agent options with collaborative exploratory behaviors among the agents, while still enjoying the ease of decomposition. Our key idea is to approximate the joint state space as a Kronecker graph -- the Kronecker product of individual agents' state transition graphs, based on which we can directly estimate the Fiedler vector of the joint state space using the Laplacian spectrum of individual agents' transition graphs. This decomposition enables us to efficiently construct multi-agent joint options by encouraging agents to connect the sub-goal joint states which are corresponding to the minimum or maximum values of the estimated joint Fiedler vector. The evaluation based on multi-agent collaborative tasks shows that the proposed algorithm can successfully identify multi-agent options, and significantly outperforms prior works using single-agent options or no options, in terms of both faster exploration and higher cumulative rewards.
Variational Automatic Curriculum Learning for Sparse-Reward Cooperative Multi-Agent Problems
Chen, Jiayu, Zhang, Yuanxin, Xu, Yuanfan, Ma, Huimin, Yang, Huazhong, Song, Jiaming, Wang, Yu, Wu, Yi
We introduce a curriculum learning algorithm, Variational Automatic Curriculum Learning (VACL), for solving challenging goal-conditioned cooperative multi-agent reinforcement learning problems. We motivate our paradigm through a variational perspective, where the learning objective can be decomposed into two terms: task learning on the current task distribution, and curriculum update to a new task distribution. Local optimization over the second term suggests that the curriculum should gradually expand the training tasks from easy to hard. Our VACL algorithm implements this variational paradigm with two practical components, task expansion and entity progression, which produces training curricula over both the task configurations as well as the number of entities in the task. Experiment results show that VACL solves a collection of sparse-reward problems with a large number of agents. Particularly, using a single desktop machine, VACL achieves 98% coverage rate with 100 agents in the simple-spread benchmark and reproduces the ramp-use behavior originally shown in OpenAI's hide-and-seek project. Our project website is at https://sites.google.com/view/vacl-neurips-2021.
Fast and Effective Robustness Certification for Recurrent Neural Networks
Ryou, Wonryong, Chen, Jiayu, Balunovic, Mislav, Singh, Gagandeep, Dan, Andrei, Vechev, Martin
We present a precise and scalable verifier for recurrent neural networks, called R2. The verifier is based on two key ideas: (i) a method to compute tight linear convex relaxations of a recurrent update function via sampling and optimization, and (ii) a technique to optimize convex combinations of multiple bounds for each neuron instead of a single bound as previously done. Using R2, we present the first study of certifying a non-trivial use case of recurrent neural networks, namely speech classification. This required us to also develop custom convex relaxations for the general operations that make up speech preprocessing. Our evaluation across a number of recurrent architectures in computer vision and speech domains shows that these networks are out of reach for existing methods as these are an order of magnitude slower than R2, while R2 successfully verified robustness in many cases.
Community-Based Trip Sharing for Urban Commuting
Hasan, Mohd. Hafiz (University of Michigan) | Hentenryck, Pascal Van (University of Michigan) | Budak, Ceren (University of Michigan) | Chen, Jiayu (University of Michigan) | Chaudhry, Chhavi (University of Michigan)
This paper explores Community-Based Trip Sharing which uses the structure of communities and commuting patterns to optimize car or ride sharing for urban communities. It introduces the Commuting Trip Sharing Problem (CTSP) and proposes an optimization approach to maximize trip sharing. The optimization method, which exploits trip clustering, shareability graphs, and mixed-integer programming, is applied to a dataset of 9000 daily commuting trips from a mid-size city. Experimental results show that community-based trip sharing reduces daily car usage by up to 44%, thus producing significant environmental and traffic benefits and reducing parking pressure. The results also indicate that daily flexibility in pairing cars and passengers has significant impact on the benefits of the approach, revealing new insights on commuting patterns and trip sharing.