Goto

Collaborating Authors

 sth


Optimal Posterior Sampling for Policy Identification in Tabular Markov Decision Processes

arXiv.org Machine Learning

We study the $(\varepsilon, ฮด)$-PAC policy identification problem in finite-horizon episodic Markov Decision Processes. Existing approaches provide finite-time guarantees for approximate settings ($\varepsilon>0$) but suffer from high computational cost, rendering them hard to implement, and also suffer from suboptimal dependence on $\log(1/ฮด)$. We propose a randomized and computationally efficient algorithm for best policy identification that combines posterior sampling with an online learning algorithm to guide exploration in the MDP. Our method achieves asymptotic optimality in sample complexity, also in terms of posterior contraction rate, and runs in $O(S^2AH)$ per episode, matching standard model-based approaches. Unlike prior algorithms such as MOCA and PEDEL, our guarantees remain meaningful in the asymptotic regime and avoid sub-optimal polynomial dependence on $\log(1/ฮด)$. Our results provide both theoretical insights and practical tools for efficient policy identification in tabular MDPs.


DARLING: Detection Augmented Reinforcement Learning with Non-Stationary Guarantees

arXiv.org Machine Learning

We study model-free reinforcement learning (RL) in non-stationary finite-horizon episodic Markov decision processes (MDPs) without prior knowledge of the non-stationarity. We focus on the piecewise-stationary (PS) setting, where both the reward and transition dynamics can change an arbitrary number of times. We propose Detection Augmented Reinforcement Learning (DARLING), a modular wrapper for PS-RL that applies to both tabular and linear MDPs, without knowledge of the changes. Under certain change-point separation and reachability conditions, DARLING improves the best available dynamic regret bounds in both settings and yields strong empirical performance. We further establish the first minimax lower bounds for PS-RL in tabular and linear MDPs, showing that DARLING is the first nearly optimal algorithm. Experiments on standard benchmarks demonstrate that DARLING consistently surpasses the state-of-the-art methods across diverse non-stationary scenarios.


MobILE: Model-BasedImitationLearning From ObservationAlone

Neural Information Processing Systems

Weprovide aunified analysis for MobILE, and demonstrate that MobILE enjoys strong performance guarantees for classes of MDP dynamics that satisfy certain well studied notions of structural complexity. We also show that the ILFO problem isstrictly harder than the standard IL problem by presenting an exponential sample complexity separation between ILand ILFO.


UnpackingRewardShaping

Neural Information Processing Systems

Much of this work is based on upper confidence bound (UCB) principles and prescribes some kind of exploration bonus to prioritize exploration of rarely visited regions.


A Selective Temporal Hamming distance to find patterns in state transition event timeseries, at scale

arXiv.org Machine Learning

Discrete event systems are present both in observations of nature, socio economical sciences, and industrial systems. Standard analysis approaches do not usually exploit their dual event / state nature: signals are either modeled as transition event sequences, emphasizing event order alignment, or as categorical or ordinal state timeseries, usually resampled a distorting and costly operation as the observation period and number of events grow. In this work we define state transition event timeseries (STE-ts) and propose a new Selective Temporal Hamming distance (STH) leveraging both transition time and duration-in-state, avoiding costly and distorting resampling on large databases. STH generalizes both resampled Hamming and Jaccard metrics with better precision and computation time, and an ability to focus on multiple states of interest. We validate these benefits on simulated and real-world datasets.


Scheduling Agile Earth Observation Satellites with Onboard Processing and Real-Time Monitoring

arXiv.org Artificial Intelligence

--The emergence of Agile Earth Observation Satellites (AEOSs) has marked a significant turning point in the field of Earth Observation (EO), offering enhanced flexibility in data acquisition. Concurrently, advancements in onboard satellite computing and communication technologies have greatly enhanced data compression efficiency, reducing network latency and congestion while supporting near real-time information delivery. In this paper, we address the Agile Earth Observation Satellite Scheduling Problem (AEOSSP), which involves determining the optimal sequence of target observations to maximize overall observation profit. T o this end, we define a set of priority indicators and develop a constructive heuristic method, further enhanced with a Local Search (LS) strategy. The results show that the proposed algorithm provides high-quality information by increasing the resolution of the collected frames by up to 10% on average, while reducing the variance in the monitoring frequency of the targets within the instance by up to 83%, ensuring more up-to-date information across the entire set compared to a First-In First-Out (FIFO) method.


SOP-Agent: Empower General Purpose AI Agent with Domain-Specific SOPs

arXiv.org Artificial Intelligence

Despite significant advancements in general-purpose AI agents, several challenges still hinder their practical application in real-world scenarios. First, the limited planning capabilities of Large Language Models (LLM) restrict AI agents from effectively solving complex tasks that require long-horizon planning. Second, general-purpose AI agents struggle to efficiently utilize domain-specific knowledge and human expertise. In this paper, we introduce the Standard Operational Procedure-guided Agent (SOP-agent), a novel framework for constructing domain-specific agents through pseudocode-style Standard Operational Procedures (SOPs) written in natural language. Formally, we represent a SOP as a decision graph, which is traversed to guide the agent in completing tasks specified by the SOP. We conduct extensive experiments across tasks in multiple domains, including decision-making, search and reasoning, code generation, data cleaning, and grounded customer service. The SOP-agent demonstrates excellent versatility, achieving performance superior to general-purpose agent frameworks and comparable to domain-specific agent systems. Additionally, we introduce the Grounded Customer Service Benchmark, the first benchmark designed to evaluate the grounded decision-making capabilities of AI agents in customer service scenarios based on SOPs.


Different thresholding methods on Nearest Shrunken Centroid algorithm

arXiv.org Machine Learning

This article considers the impact of different thresholding methods to the Nearest Shrunken Centroid algorithm, which is popularly referred as the Prediction Analysis of Microarrays (PAM) for high-dimensional classification. PAM uses soft thresholding to achieve high computational efficiency and high classification accuracy but in the price of retaining too many features. When applied to microarray human cancers, PAM selected 2611 features on average from 10 multi-class datasets. Such a large number of features make it difficult to perform follow up study. One reason behind this problem is the soft thresholding, which is known to produce biased parameter estimate in regression analysis. In this article, we extend the PAM algorithm with two other thresholding methods, hard and order thresholding, and a deep search algorithm to achieve better thresholding parameter estimate. The modified algorithms are extensively tested and compared to the original one based on real data and Monte Carlo studies. In general, the modification not only gave better cancer status prediction accuracy, but also resulted in more parsimonious models with significantly smaller number of features.


Star-shaped Tilted Hexarotor Maneuverability: Analysis of the Role of the Tilt Cant Angles

arXiv.org Artificial Intelligence

Star-shaped Tilted Hexarotors are rapidly emerging for applications highly demanding in terms of robustness and maneuverability. To ensure improvement in such features, a careful selection of the tilt angles is mandatory. In this work, we present a rigorous analysis of how the force subspace varies with the tilt cant angles, namely the tilt angles along the vehicle arms, taking into account gravity compensation and torque decoupling to abide by the hovering condition. Novel metrics are introduced to assess the performance of existing tilted platforms, as well as to provide some guidelines for the selection of the tilt cant angle in the design phase.


Few-Shot Action Recognition with Compromised Metric via Optimal Transport

arXiv.org Artificial Intelligence

Although vital to computer vision systems, few-shot action recognition is still not mature despite the wide research of few-shot image classification. Popular few-shot learning algorithms extract a transferable embedding from seen classes and reuse it on unseen classes by constructing a metric-based classifier. One main obstacle to applying these algorithms in action recognition is the complex structure of videos. Some existing solutions sample frames from a video and aggregate their embeddings to form a video-level representation, neglecting important temporal relations. Others perform an explicit sequence matching between two videos and define their distance as matching cost, imposing too strong restrictions on sequence ordering. In this paper, we propose Compromised Metric via Optimal Transport (CMOT) to combine the advantages of these two solutions. CMOT simultaneously considers semantic and temporal information in videos under Optimal Transport framework, and is discriminative for both content-sensitive and ordering-sensitive tasks. In detail, given two videos, we sample segments from them and cast the calculation of their distance as an optimal transport problem between two segment sequences. To preserve the inherent temporal ordering information, we additionally amend the ground cost matrix by penalizing it with the positional distance between a pair of segments. Empirical results on benchmark datasets demonstrate the superiority of CMOT.