Markov Models
Gait-Conditioned Reinforcement Learning with Multi-Phase Curriculum for Humanoid Locomotion
Peng, Tianhu, Bao, Lingfan, Zhou, Chengxu
We present a unified gait-conditioned reinforcement learning framework that enables humanoid robots to perform standing, walking, running, and smooth transitions within a single recurrent policy. A compact reward routing mechanism dynamically activates gait-specific objectives based on a one-hot gait ID, mitigating reward interference and supporting stable multi-gait learning. Human-inspired reward terms promote biomechanically natural motions, such as straight-knee stance and coordinated arm-leg swing, without requiring motion capture data. A structured curriculum progressively introduces gait complexity and expands command space over multiple phases. In simulation, the policy successfully achieves robust standing, walking, running, and gait transitions. On the real Unitree G1 humanoid, we validate standing, walking, and walk-to-stand transitions, demonstrating stable and coordinated locomotion. This work provides a scalable, reference-free solution toward versatile and naturalistic humanoid control across diverse modes and environments.
FR-Net: Learning Robust Quadrupedal Fall Recovery on Challenging Terrains through Mass-Contact Prediction
Lu, Yidan, Dong, Yinzhao, Zhang, Jiahui, Ma, Ji, Lu, Peng
Fall recovery for legged robots remains challenging, particularly on complex terrains where traditional controllers fail due to incomplete terrain perception and uncertain interactions. We present \textbf{FR-Net}, a learning-based framework that enables quadrupedal robots to recover from arbitrary fall poses across diverse environments. Central to our approach is a Mass-Contact Predictor network that estimates the robot's mass distribution and contact states from limited sensory inputs, facilitating effective recovery strategies. Our carefully designed reward functions ensure safe recovery even on steep stairs without dangerous rolling motions common to existing methods. Trained entirely in simulation using privileged learning, our framework guides policy learning without requiring explicit terrain data during deployment. We demonstrate the generalization capabilities of \textbf{FR-Net} across different quadrupedal platforms in simulation and validate its performance through extensive real-world experiments on the Go2 robot in 10 challenging scenarios. Our results indicate that explicit mass-contact prediction is key to robust fall recovery, offering a promising direction for generalizable quadrupedal skills.
Detecting Model Drifts in Non-Stationary Environment Using Edit Operation Measures
Lee, Chang-Hwan, Shim, Alexander
Reinforcement learning (RL) agents typically assume stationary environment dynamics. Yet in real-world applications such as healthcare, robotics, and finance, transition probabilities or reward functions may evolve, leading to model drift. This paper proposes a novel framework to detect such drifts by analyzing the distributional changes in sequences of agent behavior. Specifically, we introduce a suite of edit operation-based measures to quantify deviations between state-action trajectories generated under stationary and perturbed conditions. Our experiments demonstrate that these measures can effectively distinguish drifted from non-drifted scenarios, even under varying levels of noise, providing a practical tool for drift detection in non-stationary RL environments.
ManiVID-3D: Generalizable View-Invariant Reinforcement Learning for Robotic Manipulation via Disentangled 3D Representations
Li, Zheng, Qu, Pei, Jia, Yufei, Zhou, Shihui, Ge, Haizhou, Cao, Jiahang, Zhou, Jinni, Zhou, Guyue, Ma, Jun
Deploying visual reinforcement learning (RL) policies in real-world manipulation is often hindered by camera viewpoint changes. A policy trained from a fixed front-facing camera may fail when the camera is shifted--an unavoidable situation in real-world settings where sensor placement is hard to manage appropriately. Existing methods often rely on precise camera calibration or struggle with large perspective changes. To address these limitations, we propose ManiVID-3D, a novel 3D RL architecture designed for robotic manipulation, which learns view-invariant representations through self-supervised disentangled feature learning. The framework incorporates ViewNet, a lightweight yet effective module that automatically aligns point cloud observations from arbitrary viewpoints into a unified spatial coordinate system without the need for extrinsic calibration. Additionally, we develop an efficient GPU-accelerated batch rendering module capable of processing over 5000 frames per second, enabling large-scale training for 3D visual RL at unprecedented speeds. Extensive evaluation across 10 simulated and 5 real-world tasks demonstrates that our approach achieves a 44.7% higher success rate than state-of-the-art methods under viewpoint variations while using 80% fewer parameters. The system's robustness to severe perspective changes and strong sim-to-real performance highlight the effectiveness of learning geometrically consistent representations for scalable robotic manipulation in unstructured environments. Our project website can be found in https://zheng-joe-lee.github.io/manivid3d/.
GCN-TULHOR: Trajectory-User Linking Leveraging GCNs and Higher-Order Spatial Representations
Tran, Khoa, Gupta, Pranav, Papagelis, Manos
Trajectory-user linking (TUL) aims to associate anonymized trajectories with the users who generated them, which is crucial for personalized recommendations, privacy-preserving analytics, and secure location-based services. Existing methods struggle with sparse data, incomplete routes, and limited modeling of complex spatial dependencies, often relying on low-level check-in data or ignoring spatial patterns. In this paper, we introduced GCN-TULHOR, a method that transforms raw location data into higher-order mobility flow representations using hexagonal tessellation, reducing data sparsity and capturing richer spatial semantics, and integrating Graph Convolutional Networks (GCNs). Our approach converts both sparse check-in and continuous GPS trajectory data into unified higher-order flow representations, mitigating sparsity while capturing deeper semantic information. The GCN layer explicitly models complex spatial relationships and non-local dependencies without requiring side information such as timestamps or points of interest. Experiments on six real-world datasets show consistent improvements over classical baselines, RNN- and Transformer-based models, and the TULHOR method in accuracy, precision, recall, and F1-score. GCN-TULHOR achieves 1-8% relative gains in accuracy and F1. Sensitivity analysis identifies an optimal setup with a single GCN layer and 512-dimensional embeddings. The integration of GCNs enhances spatial learning and improves generalizability across mobility data. This work highlights the value of combining graph-based spatial learning with sequential modeling, offering a robust and scalable solution for TUL with applications in recommendations, urban planning, and security.
Large Foundation Models for Trajectory Prediction in Autonomous Driving: A Comprehensive Survey
Dai, Wei, Wu, Shengen, Wu, Wei, Wang, Zhenhao, Lyu, Sisuo, Liao, Haicheng, Yu, Limin, Ding, Weiping, Guan, Runwei, Yue, Yutao
Trajectory prediction serves as a critical functionality in autonomous driving, enabling the anticipation of future motion paths for traffic participants such as vehicles and pedestrians, which is essential for driving safety. Although conventional deep learning methods have improved accuracy, they remain hindered by inherent limitations, including lack of interpretability, heavy reliance on large-scale annotated data, and weak generalization in long-tail scenarios. The rise of Large Foundation Models (LFMs) is transforming the research paradigm of trajectory prediction. This survey offers a systematic review of recent advances in LFMs, particularly Large Language Models (LLMs) and Multimodal Large Language Models (MLLMs) for trajectory prediction. By integrating linguistic and scene semantics, LFMs facilitate interpretable contextual reasoning, significantly enhancing prediction safety and generalization in complex environments. The article highlights three core methodologies: trajectory-language mapping, multimodal fusion, and constraint-based reasoning. It covers prediction tasks for both vehicles and pedestrians, evaluation metrics, and dataset analyses. Key challenges such as computational latency, data scarcity, and real-world robustness are discussed, along with future research directions including low-latency inference, causality-aware modeling, and motion foundation models.
Adaptive Temporal Fusion Transformers for Cryptocurrency Price Prediction
Peik, Arash, Chahooki, Mohammad Ali Zare, Fard, Amin Milani, Sarram, Mehdi Agha
Precise short-term price prediction in the highly volatile cryptocurrency market is critical for informed trading strategies. Although Temporal Fusion Transformers (TFTs) have shown potential, their direct use often struggles in the face of the market's non-stationary nature and extreme volatility. This paper introduces an adaptive TFT modeling approach leveraging dynamic subseries lengths and pattern-based categorization to enhance short-term forecasting. We propose a novel segmentation method where subseries end at relative maxima, identified when the price increase from the preceding minimum surpasses a threshold, thus capturing significant upward movements, which act as key markers for the end of a growth phase, while potentially filtering the noise. Crucially, the fixed-length pattern ending each subseries determines the category assigned to the subsequent variable-length subseries, grouping typical market responses that follow similar preceding conditions. A distinct TFT model trained for each category is specialized in predicting the evolution of these subsequent subseries based on their initial steps after the preceding peak. Experimental results on ETH-USDT 10-minute data over a two-month test period demonstrate that our adaptive approach significantly outperforms baseline fixed-length TFT and LSTM models in prediction accuracy and simulated trading profitability. Our combination of adaptive segmentation and pattern-conditioned forecasting enables more robust and responsive cryptocurrency price prediction.
Contextuality, Holonomy and Discrete Fiber Bundles in Group-Valued Boltzmann Machines
We propose a geometric extension of restricted Boltzmann machines (RBMs) by allowing weights to take values in abstract groups such as \( \mathrm{GL}_n(\mathbb{R}) \), \( \mathrm{SU}(2) \), or even infinite-dimensional operator groups. This generalization enables the modeling of complex relational structures, including projective transformations, spinor dynamics, and functional symmetries, with direct applications to vision, language, and quantum learning. A central contribution of this work is the introduction of a \emph{contextuality index} based on group-valued holonomies computed along cycles in the RBM graph. This index quantifies the global inconsistency or "curvature" induced by local weights, generalizing classical notions of coherence, consistency, and geometric flatness. We establish links with sheaf-theoretic contextuality, gauge theory, and noncommutative geometry, and provide numerical and diagrammatic examples in both finite and infinite dimensions. This framework opens novel directions in AI, from curvature-aware learning architectures to topological regularization in uncertain or adversarial environments.
A Service-Oriented Adaptive Hierarchical Incentive Mechanism for Federated Learning
Cao, Jiaxing, Gao, Yuzhou, Huang, Jiwei
Recently, federated learning (FL) has emerged as a novel framework for distributed model training. In FL, the task publisher (TP) releases tasks, and local model owners (LMOs) use their local data to train models. Sometimes, FL suffers from the lack of training data, and thus workers are recruited for gathering data. To this end, this paper proposes an adaptive incentive mechanism from a service-oriented perspective, with the objective of maximizing the utilities of TP, LMOs and workers. Specifically, a Stackelberg game is theoretically established between the LMOs and TP, positioning TP as the leader and the LMOs as followers. An analytical Nash equilibrium solution is derived to maximize their utilities. The interaction between LMOs and workers is formulated by a multi-agent Markov decision process (MAMDP), with the optimal strategy identified via deep reinforcement learning (DRL). Additionally, an Adaptively Searching the Optimal Strategy Algorithm (ASOSA) is designed to stabilize the strategies of each participant and solve the coupling problems. Extensive numerical experiments are conducted to validate the efficacy of the proposed method.
A Convolution and Attention Based Encoder for Reinforcement Learning under Partial Observability
B. Observation History The core contribution of this work is a novel history encoder for processing historical observations, which integrates two key operations: depthwise separable convolution and multi-head attention. The background of these operations is briefly reviewed below. Depthwise separable convolution [33] is a streamlined variant of standard convolution that reduces both parameter count and computational cost. It decomposes the operation into two steps: (1) a depthwise convolution, which applies a single filter to each input channel, and (2) a pointwise convolution, which uses a 1 1 convolution to linearly combine the outputs of the depthwise stage. This factorization enables efficient extraction of spatial and cross-channel features while maintaining strong representational capacity. It has been widely adopted in lightweight neural architectures such as MobileNet [34] and is particularly well suited to real-time and resource-constrained applications. Multi-head attention [9] is a fundamental component of Transformer architectures, enabling the model to capture diverse patterns across different representation subspaces.