Dai, Min
Learning Merton's Strategies in an Incomplete Market: Recursive Entropy Regularization and Biased Gaussian Exploration
Dai, Min, Dong, Yuchao, Jia, Yanwei, Zhou, Xun Yu
We study Merton's expected utility maximization problem in an incomplete market, characterized by a factor process in addition to the stock price process, where all the model primitives are unknown. We take the reinforcement learning (RL) approach to learn optimal portfolio policies directly by exploring the unknown market, without attempting to estimate the model parameters. Based on the entropy-regularization framework for general continuous-time RL formulated in Wang et al. (2020), we propose a recursive weighting scheme on exploration that endogenously discounts the current exploration reward by the past accumulative amount of exploration. Such a recursive regularization restores the optimality of Gaussian exploration. However, contrary to the existing results, the optimal Gaussian policy turns out to be biased in general, due to the interwinding needs for hedging and for exploration. We present an asymptotic analysis of the resulting errors to show how the level of exploration affects the learned policies. Furthermore, we establish a policy improvement theorem and design several RL algorithms to learn Merton's optimal strategies. At last, we carry out both simulation and empirical studies with a stochastic volatility environment to demonstrate the efficiency and robustness of the RL algorithms in comparison to the conventional plug-in method.
Multi-Domain Walking with Reduced-Order Models of Locomotion
Dai, Min, Lee, Jaemin, Ames, Aaron D.
Drawing inspiration from human multi-domain walking, this work presents a novel reduced-order model based framework for realizing multi-domain robotic walking. At the core of our approach is the viewpoint that human walking can be represented by a hybrid dynamical system, with continuous phases that are fully-actuated, under-actuated, and over-actuated and discrete changes in actuation type occurring with changes in contact. Leveraging this perspective, we synthesize a multi-domain linear inverted pendulum (MLIP) model of locomotion. Utilizing the step-to-step dynamics of the MLIP model, we successfully demonstrate multi-domain walking behaviors on the bipedal robot Cassie -- a high degree of freedom 3D bipedal robot. Thus, we show the ability to bridge the gap between multi-domain reduced order models and full-order multi-contact locomotion. Additionally, our results showcase the ability of the proposed method to achieve versatile speed-tracking performance and robust push recovery behaviors.
Multi-task Meta Label Correction for Time Series Prediction
Yang, Luxuan, Gao, Ting, Wei, Wei, Dai, Min, Fang, Cheng, Duan, Jinqiao
Time series classification faces two unavoidable problems. One is partial feature information and the other is poor label quality, which may affect model performance. To address the above issues, we create a label correction method to time series data with meta-learning under a multi-task framework. There are three main contributions. First, we train the label correction model with a two-branch neural network for the outer loop. While in the model-agnostic inner loop, we use pre-existing classification models in a multi-task way and jointly update the meta-knowledge, which makes us achieve adaptive labeling on complex time series. Second, we devise new data visualization methods for both image patterns of the historical data and data in the prediction horizon. Finally, we test our method with various financial datasets, including XOM, S\&P500, and SZ50. Results show that our method is more effective and accurate than some existing label correction techniques.
Data-driven Adaptation for Robust Bipedal Locomotion with Step-to-Step Dynamics
Dai, Min, Xiong, Xiaobin, Lee, Jaemin, Ames, Aaron D.
This paper presents an online framework for synthesizing agile locomotion for bipedal robots that adapts to unknown environments, modeling errors, and external disturbances. To this end, we leverage step-to-step (S2S) dynamics which has proven effective in realizing dynamic walking on underactuated robots -- assuming known dynamics and environments. This paper considers the case of uncertain models and environments and presents a data-driven representation of the S2S dynamics that can be learned via an adaptive control approach that is both data-efficient and easy to implement. The learned S2S controller generates desired discrete foot placement, which is then realized on the full-order dynamics of the bipedal robot by tracking desired outputs synthesized from the given foot placement. The benefits of the proposed approach are twofold. First, it improves the ability of the robot to walk at a given desired velocity when compared to the non-adaptive baseline controller. Second, the data-driven approach enables stable and agile locomotion under the effect of various unknown disturbances: additional unmodeled payload, large robot model errors, external disturbance forces, biased velocity estimation, and sloped terrains. This is demonstrated through in-depth evaluation with a high-fidelity simulation of the bipedal robot Cassie subject to the aforementioned disturbances.