Gao, Zijian
Exploring structure diversity in atomic resolution microscopy with graph neural networks
Luo, Zheng, Feng, Ming, Gao, Zijian, Yu, Jinyang, Hu, Liang, Wang, Tao, Xue, Shenao, Zhou, Shen, Ouyang, Fangping, Feng, Dawei, Xu, Kele, Wang, Shanshan
The emergence of deep learning (DL) has provided great opportunities for the high-throughput analysis of atomic-resolution micrographs. However, the DL models trained by image patches in fixed size generally lack efficiency and flexibility when processing micrographs containing diversified atomic configurations. Herein, inspired by the similarity between the atomic structures and graphs, we describe a few-shot learning framework based on an equivariant graph neural network (EGNN) to analyze a library of atomic structures (e.g., vacancies, phases, grain boundaries, doping, etc.), showing significantly promoted robustness and three orders of magnitude reduced computing parameters compared to the image-driven DL models, which is especially evident for those aggregated vacancy lines with flexible lattice distortion. Besides, the intuitiveness of graphs enables quantitative and straightforward extraction of the atomic-scale structural features in batches, thus statistically unveiling the self-assembly dynamics of vacancy lines under electron beam irradiation. A versatile model toolkit is established by integrating EGNN sub-models for single structure recognition to process images involving varied configurations in the form of a task chain, leading to the discovery of novel doping configurations with superior electrocatalytic properties for hydrogen evolution reactions. This work provides a powerful tool to explore structure diversity in a fast, accurate, and intelligent manner.
Optimistic Model Rollouts for Pessimistic Offline Policy Optimization
Zhai, Yuanzhao, Li, Yiying, Gao, Zijian, Gong, Xudong, Xu, Kele, Feng, Dawei, Bo, Ding, Wang, Huaimin
Model-based offline reinforcement learning (RL) has made remarkable progress, offering a promising avenue for improving generalization with synthetic model rollouts. Existing works primarily focus on incorporating pessimism for policy optimization, usually via constructing a Pessimistic Markov Decision Process (P-MDP). However, the P-MDP discourages the policies from learning in out-of-distribution (OOD) regions beyond the support of offline datasets, which can under-utilize the generalization ability of dynamics models. In contrast, we propose constructing an Optimistic MDP (O-MDP). We initially observed the potential benefits of optimism brought by encouraging more OOD rollouts. Motivated by this observation, we present ORPO, a simple yet effective model-based offline RL framework. ORPO generates Optimistic model Rollouts for Pessimistic offline policy Optimization. Specifically, we train an optimistic rollout policy in the O-MDP to sample more OOD model rollouts. Then we relabel the sampled state-action pairs with penalized rewards and optimize the output policy in the P-MDP. Theoretically, we demonstrate that the performance of policies trained with ORPO can be lower-bounded in linear MDPs. Experimental results show that our framework significantly outperforms P-MDP baselines by a margin of 30%, achieving state-of-the-art performance on the widely-used benchmark. Moreover, ORPO exhibits notable advantages in problems that require generalization.
Dynamic Memory-based Curiosity: A Bootstrap Approach for Exploration
Gao, Zijian, Li, YiYing, Xu, Kele, Zhai, Yuanzhao, Feng, Dawei, Ding, Bo, Mao, XinJun, Wang, Huaimin
The sparsity of extrinsic rewards poses a serious challenge for reinforcement learning (RL). Currently, many efforts have been made on curiosity which can provide a representative intrinsic reward for effective exploration. However, the challenge is still far from being solved. In this paper, we present a novel curiosity for RL, named DyMeCu, which stands for Dynamic Memory-based Curiosity. Inspired by human curiosity and information theory, DyMeCu consists of a dynamic memory and dual online learners. The curiosity arouses if memorized information can not deal with the current state, and the information gap between dual learners can be formulated as the intrinsic reward for agents, and then such state information can be consolidated into the dynamic memory. Compared with previous curiosity methods, DyMeCu can better mimic human curiosity with dynamic memory, and the memory module can be dynamically grown based on a bootstrap paradigm with dual learners. On multiple benchmarks including DeepMind Control Suite and Atari Suite, large-scale empirical experiments are conducted and the results demonstrate that DyMeCu outperforms competitive curiosity-based methods with or without extrinsic rewards. We will release the code to enhance reproducibility.
Self-Supervised Exploration via Temporal Inconsistency in Reinforcement Learning
Gao, Zijian, Xu, Kele, Zhai, Yuanzhao, Feng, Dawei, Ding, Bo, Mao, XinJun, Wang, Huaimin
Under sparse extrinsic reward settings, reinforcement learning has remained challenging, despite surging interests in this field. Previous attempts suggest that intrinsic reward can alleviate the issue caused by sparsity. In this article, we present a novel intrinsic reward that is inspired by human learning, as humans evaluate curiosity by comparing current observations with historical knowledge. Our method involves training a self-supervised prediction model, saving snapshots of the model parameters, and using nuclear norm to evaluate the temporal inconsistency between the predictions of different snapshots as intrinsic rewards. We also propose a variational weighting mechanism to assign weight to different snapshots in an adaptive manner. Our experimental results on various benchmark environments demonstrate the efficacy of our method, which outperforms other intrinsic reward-based methods without additional training costs and with higher noise tolerance. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.
KnowSR: Knowledge Sharing among Homogeneous Agents in Multi-agent Reinforcement Learning
Gao, Zijian, Xu, Kele, Ding, Bo, Wang, Huaimin, Li, Yiying, Jia, Hongda
Recently, deep reinforcement learning (RL) algorithms have made great progress in multi-agent domain. However, due to characteristics of RL, training for complex tasks would be resource-intensive and time-consuming. To meet this challenge, mutual learning strategy between homogeneous agents is essential, which is under-explored in previous studies, because most existing methods do not consider to use the knowledge of agent models. In this paper, we present an adaptation method of the majority of multi-agent reinforcement learning (MARL) algorithms called "KnowSR" which takes advantage of the differences in learning between agents. We employ the idea of knowledge distillation (KD) to share knowledge among agents to shorten the training phase. To empirically demonstrate the robustness and effectiveness of KnowSR, we performed extensive experiments on state-of-the-art MARL algorithms in collaborative and competitive scenarios. The results demonstrate that KnowSR outperforms recently reported methodologies, emphasizing the importance of the proposed knowledge sharing for MARL.
KnowRU: Knowledge Reusing via Knowledge Distillation in Multi-agent Reinforcement Learning
Gao, Zijian, Xu, Kele, Ding, Bo, Wang, Huaimin, Li, Yiying, Jia, Hongda
Recently, deep Reinforcement Learning (RL) algorithms have achieved dramatically progress in the multi-agent area. However, training the increasingly complex tasks would be time-consuming and resources-exhausting. To alleviate this problem, efficient leveraging the historical experience is essential, which is under-explored in previous studies as most of the exiting methods may fail to achieve this goal in a continuously variational system due to their complicated design and environmental dynamics. In this paper, we propose a method, named "KnowRU" for knowledge reusing which can be easily deployed in the majority of the multi-agent reinforcement learning algorithms without complicated hand-coded design. We employ the knowledge distillation paradigm to transfer the knowledge among agents with the goal to accelerate the training phase for new tasks, while improving the asymptotic performance of agents. To empirically demonstrate the robustness and effectiveness of KnowRU, we perform extensive experiments on state-of-the-art multi-agent reinforcement learning (MARL) algorithms on collaborative and competitive scenarios. The results show that KnowRU can outperform the recently reported methods, which emphasizes the importance of the proposed knowledge reusing for MARL.
Random forest model identifies serve strength as a key predictor of tennis match outcome
Gao, Zijian, Kowalczyk, Amanda
Tennis is a popular sport worldwide, boasting millions of fans and numerous national and international tournaments. Like many sports, tennis has benefit ted from the popularity of rigorous record - keeping of game and player information, as well as the growth of machine learning methods for use in sports analytics. Of particular interest to bettors and betting companies alike is potential use of sports records to predict tennis match outcomes prior to match start. We compiled, cleaned, and used th e largest database of tennis match information to date to predict match outcome using fairly simple machine learning methods. Using such methods allows for rapid fit and prediction times to readily incorporate new data and make real - time predictions. We were able to predict match outcomes with upwards of 80% accuracy, much greater than predictions using betting odds alone, an d identify serve strength as a key predictor of match outcome. By combining prediction accuracies from three models, we were able t o nearly recreate a probability distribution based on average betting odds from betting companies, which indicates that betting companies are using similar information to assign odds to matches. These results demonstrate the capability of relatively simpl e machine learning models to quite accurately predict tennis match outcomes.