state 1
Understanding Long-Term Dynamics of Individual Metro Usage: A Hidden Semi-Markov State Framework with Survival Analysis
Wang, Bingxun, Urbano, Valeria Maria, He, Shan, Chen, Yang, Liu, Wei, Jiang, Zhibin, Secchi, Piercesare
Understanding how individual metro usage evolves over multi-year horizons is essential for transit planning and passenger retention. However, existing approaches typically characterize mobility patterns as static clusters or short-term variability, leaving the lifecycle dynamics of transit participation underexplored. This study proposes a state-based lifecycle modeling framework that integrates Hidden Semi-Markov Models (HSMM) with discrete-time survival analysis to characterize the evolution of individual metro mobility. The HSMM infers latent mobility states with explicit duration distributions and a transition matrix governing regime changes, while the survival component models exit and re-entry events via state-dependent hazard functions conditioned on mobility-state trajectories and behavioral history. Applied to four years of smart card data from the Shanghai metro system (2021-2024), the framework enables the identification of interpretable mobility states, the characterization of transition dynamics, and the quantification of state-dependent exit and re-entry processes. The analysis reveals five robust mobility states with a directional transition hierarchy centered on an occasional-usage gateway state, and fundamentally different temporal mechanisms governing disengagement and return: exit hazard is state-dependent but duration-independent, whereas re-entry hazard decays sharply with inactivity length. These findings provide a methodological foundation for lifecycle-oriented mobility analysis and practical guidance for transit operators to identify at-risk users and time retention interventions.
A Proofs and Derivation A.1 Proof for Theorem
Let's follow the notations in Alg. 3 of Argmax Flow. We can unfold the determinant by the i-th row. This is illustrated in Figure A.1, where the adaptive Further details can be found in Tables A.2. Furthermore, we will make the code used to reproduce these results publicly available. In different environments, different state encoders were exploited. We used MLP encoder for discrete control tasks and CNN encoder for Pistonball task.
Toggling stiffness via multistability
Oliveira, Hugo de Souza, Curatolo, Michele, Sachse, Renate, Milana, Edoardo
Mechanical metamaterials enable unconventional and programmable mechanical responses through structural design rather than material composition. In this work, we introduce a multistable mechanical metamaterial that exhibits a toggleable stiffness effect, where the effective shear stiffness switches discretely between stable configurations. The mechanical analysis of surrogate beam models of the unit cell reveal that this behavior originates from the rotation transmitted by the support beams to the curved beam, which governs the balance between bending and axial deformation. The stiffness ratio between the two states of the unit cell can be tuned by varying the slenderness of the support beams or by incorporating localized hinges that modulate rotational transfer. Experiments on 3D-printed prototypes validate the numerical predictions, confirming consistent stiffness toggling across different geometries. Finally, we demonstrate a monolithic soft clutch that leverages this effect to achieve programmable, stepwise stiffness modulation. This work establishes a design strategy for toggleable stiffness using multistable metamaterials, paving the way for adaptive, lightweight, and autonomous systems in soft robotics and smart structures.
A Proofs and Derivation
Let's follow the notations in Alg. 3 of Argmax Flow. We can unfold the determinant by the i-th row. This is illustrated in Figure A.1, where the adaptive Further details can be found in Tables A.2. Furthermore, we will make the code used to reproduce these results publicly available. In different environments, different state encoders were exploited. We used MLP encoder for discrete control tasks and CNN encoder for Pistonball task.
Deliberate Planning in Language Models with Symbolic Representation
Xiong, Siheng, Liu, Zhangding, Zhou, Jieyu, Su, Yusen
Planning remains a core challenge for large language models (LLMs), particularly in domains that require coherent multi-step action sequences grounded in external constraints. We introduce SymPlanner, a novel framework that equips LLMs with structured planning capabilities by interfacing them with a symbolic environment that serves as an explicit world model. Rather than relying purely on natural language reasoning, SymPlanner grounds the planning process in a symbolic state space, where a policy model proposes actions and a symbolic environment deterministically executes and verifies their effects. To enhance exploration and improve robustness, we introduce Iterative Correction (IC), which refines previously proposed actions by leveraging feedback from the symbolic environment to eliminate invalid decisions and guide the model toward valid alternatives. Additionally, Contrastive Ranking (CR) enables fine-grained comparison of candidate plans by evaluating them jointly. Conceptually, SymPlanner operationalizes two cognitive faculties: (i) error monitoring and repair via externalized feedback (IC) and (ii) preference formation among alternatives via pairwise comparison (CR), advancing cognitively plausible, symbol-grounded planning aligned with the rich structure in intelligent systems. We evaluate SymPlanner on PlanBench, demonstrating that it produces more coherent, diverse, and verifiable plans than pure natural language baselines.
Optimal Single-Policy Sample Complexity and Transient Coverage for Average-Reward Offline RL
Zurek, Matthew, Zamir, Guy, Chen, Yudong
We study offline reinforcement learning in average-reward MDPs, which presents increased challenges from the perspectives of distribution shift and non-uniform coverage, and has been relatively underexamined from a theoretical perspective. While previous work obtains performance guarantees under single-policy data coverage assumptions, such guarantees utilize additional complexity measures which are uniform over all policies, such as the uniform mixing time. We develop sharp guarantees depending only on the target policy, specifically the bias span and a novel policy hitting radius, yielding the first fully single-policy sample complexity bound for average-reward offline RL. We are also the first to handle general weakly communicating MDPs, contrasting restrictive structural assumptions made in prior work. To achieve this, we introduce an algorithm based on pessimistic discounted value iteration enhanced by a novel quantile clipping technique, which enables the use of a sharper empirical-span-based penalty function. Our algorithm also does not require any prior parameter knowledge for its implementation. Remarkably, we show via hard examples that learning under our conditions requires coverage assumptions beyond the stationary distribution of the target policy, distinguishing single-policy complexity measures from previously examined cases. We also develop lower bounds nearly matching our main result.