stl
other comments in the paper if accepted
We appreciate the valuable comments from the reviewers. We will answer reviewers' questions from three aspects, i.e., In respond to Reviewer 5, this paper's major novelty is developing a new STL-based learning framework to Our method creates a practical way to ensure the logic rules' satisfaction in an end-to-end manner. Our approach achieves promising results on real city datasets, i.e., significantly We have carefully compared our work with all the related papers pointed out by the reviewers. Therefore, we also choose STL to express the model properties. Using STL to specify CPS properties is not our novelty.
From STLS to Projection-based Dictionary Selection in Sparse Regression for System Identification
Cho, Hangjun, Amaral, Fabio V. G., Klishin, Andrei A., Oishi, Cassio M., Brunton, Steven L.
In this work, we revisit dictionary-based sparse regression, in particular, Sequential Threshold Least Squares (STLS), and propose a score-guided library selection to provide practical guidance for data-driven modeling, with emphasis on SINDy-type algorithms. STLS is an algorithm to solve the $\ell_0$ sparse least-squares problem, which relies on splitting to efficiently solve the least-squares portion while handling the sparse term via proximal methods. It produces coefficient vectors whose components depend on both the projected reconstruction errors, here referred to as the scores, and the mutual coherence of dictionary terms. The first contribution of this work is a theoretical analysis of the score and dictionary-selection strategy. This could be understood in both the original and weak SINDy regime. Second, numerical experiments on ordinary and partial differential equations highlight the effectiveness of score-based screening, improving both accuracy and interpretability in dynamical system identification. These results suggest that integrating score-guided methods to refine the dictionary more accurately may help SINDy users in some cases to enhance their robustness for data-driven discovery of governing equations.
Maximum Mean Discrepancy with Unequal Sample Sizes via Generalized U-Statistics
Wei, Aaron, Jalali, Milad, Sutherland, Danica J.
Existing two-sample testing techniques, particularly those based on choosing a kernel for the Maximum Mean Discrepancy (MMD), often assume equal sample sizes from the two distributions. Applying these methods in practice can require discarding valuable data, unnecessarily reducing test power. W e address this long-standing limitation by extending the theory of generalized U-statistics and applying it to the usual MMD estimator, resulting in new characterization of the asymptotic distributions of the MMD estimator with unequal sample sizes (particularly outside the proportional regimes required by previous partial results). This generalization also provides a new criterion for optimizing the power of an MMD test with unequal sample sizes. Our approach preserves all available data, enhancing test accuracy and applicability in realistic settings. Along the way, we give much cleaner characterizations of the variance of MMD estimators, revealing something that might be surprising to those in the area: while zero MMD implies a degenerate estimator, it is sometimes possible to have a degenerate estimator with nonzero MMD as well; we give a construction and a proof that it does not happen in common situations.
See, Think, Learn: A Self-Taught Multimodal Reasoner
Sharma, Sourabh, Gupta, Sonam, Sadbhawna, null
Vision-Language Models (VLMs) have achieved remarkable progress in integrating visual perception with language understanding. However, effective multimodal reasoning requires both accurate perception and robust reasoning, and weakness in either limits the performance of VLMs. Prior efforts to enhance reasoning often depend on high-quality chain-of-thought (CoT) data, obtained via labor-intensive human annotations, costly proprietary models, or self-training methods that overlook perception. To address these limitations, we propose a simple yet effective self-training framework called See-Think-Learn (STL). At its core, STL introduces a structured reasoning template that encourages the model to see before thinking, first extracting visual attributes in textual form, then using them to guide reasoning. The framework jointly improves perception and reasoning by having the model generate and learn from its own structured rationales in a self-training loop. Furthermore, we augment the training data with negative rationales, i.e. explanations that justify why certain answer choices are incorrect, to enhance the model's ability to distinguish between correct and misleading responses. This fosters more discriminative and robust learning. Experiments across diverse domains show that STL consistently outperforms baselines trained directly only on answers or self-generated reasoning, while qualitative analysis confirms the high quality of its rationales. STL thus provides a cost-effective solution to enhance multimodal reasoning ability of VLMs.
Achieving Safe Control Online through Integration of Harmonic Control Lyapunov-Barrier Functions with Unsafe Object-Centric Action Policies
Fawn, Marlow, Scheutz, Matthias
Open-world environments pose many challenges for autonomous robots as unexpected events or task modulations can make learned robot behavior inapplicable or obsolete. Consider, for example, a robot that has learned to autonomously perform a sorting task on a table top without any human interventions when a human co-worker steps in to help with finishing the task. This change in task environment now requires the robot to avoid colliding with the human whose arms are extended into the robot's work space and are dynamically changing position. Even if the robot has the perceptual capability to detect and track the human's arms and hands, its trained action policy does not provide a way to account for the motion constraints they impose. Or consider a delivery robot in a warehouse that has an optimized policy for traversing indoor spaces when dynamic constraints are imposed on where it can drive (e.g., because parts of the floor are painted).
TGPO: Temporal Grounded Policy Optimization for Signal Temporal Logic Tasks
Meng, Yue, Chen, Fei, Fan, Chuchu
Learning control policies for complex, long-horizon tasks is a central challenge in robotics and autonomous systems. Signal Temporal Logic (STL) offers a powerful and expressive language for specifying such tasks, but its non-Markovian nature and inherent sparse reward make it difficult to be solved via standard Reinforcement Learning (RL) algorithms. Prior RL approaches focus only on limited STL fragments or use STL robustness scores as sparse terminal rewards. In this paper, we propose TGPO, Temporal Grounded Policy Optimization, to solve general STL tasks. TGPO decomposes STL into timed subgoals and invariant constraints and provides a hierarchical framework to tackle the problem. The high-level component of TGPO proposes concrete time allocations for these subgoals, and the low-level time-conditioned policy learns to achieve the sequenced subgoals using a dense, stage-wise reward signal. During inference, we sample various time allocations and select the most promising assignment for the policy network to rollout the solution trajectory. To foster efficient policy learning for complex STL with multiple subgoals, we leverage the learned critic to guide the high-level temporal search via Metropolis-Hastings sampling, focusing exploration on temporally feasible solutions. We conduct experiments on five environments, ranging from low-dimensional navigation to manipulation, drone, and quadrupedal locomotion. Under a wide range of STL tasks, TGPO significantly outperforms state-of-the-art baselines (especially for high-dimensional and long-horizon cases), with an average of 31.6% improvement in task success rate compared to the best baseline. The code will be available at https://github.com/mengyuest/TGPO
Learning to Route: Per-Sample Adaptive Routing for Multimodal Multitask Prediction
Ajirak, Marzieh, Bein, Oded, Bowen, Ellen Rose, Kanellopoulos, Dora, Falk, Avital, Gunning, Faith M., Solomonov, Nili, Grosenick, Logan
We propose a unified framework for adaptive routing in multitask, multimodal prediction settings where data heterogeneity and task interactions vary across samples. Motivated by applications in psychotherapy where structured assessments and unstructured clinician notes coexist with partially missing data and correlated outcomes, we introduce a routing-based architecture that dynamically selects modality processing pathways and task-sharing strategies on a per-sample basis. Our model defines multiple modality paths, including raw and fused representations of text and numeric features and learns to route each input through the most informative expert combination. Task-specific predictions are produced by shared or independent heads depending on the routing decision, and the entire system is trained end-to-end. We evaluate the model on both synthetic data and real-world psychotherapy notes predicting depression and anxiety outcomes. Our experiments show that our method consistently outperforms fixed multitask or single-task baselines, and that the learned routing policy provides interpretable insights into modality relevance and task structure. This addresses critical challenges in personalized healthcare by enabling per-subject adaptive information processing that accounts for data heterogeneity and task correlations. Applied to psychotherapy, this framework could improve mental health outcomes, enhance treatment assignment precision, and increase clinical cost-effectiveness through personalized intervention strategies.