ini
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
- Oceania > Australia > New South Wales > Sydney (0.04)
- North America > United States > California (0.04)
Schrödinger bridge for generative AI: Soft-constrained formulation and convergence analysis
Ma, Jin, Tan, Ying, Xu, Renyuan
Generative AI can be framed as the problem of learning a model that maps simple reference measures into complex data distributions, and it has recently found a strong connection to the classical theory of the Schrödinger bridge problems (SBPs) due partly to their common nature of interpolating between prescribed marginals via entropy-regularized stochastic dynamics. However, the classical SBP enforces hard terminal constraints, which often leads to instability in practical implementations, especially in high-dimensional or data-scarce regimes. To address this challenge, we follow the idea of the so-called soft-constrained Schrödinger bridge problem (SCSBP), in which the terminal constraint is replaced by a general penalty function. This relaxation leads to a more flexible stochastic control formulation of McKean-Vlasov type. We establish the existence of optimal solutions for all penalty levels and prove that, as the penalty grows, both the controls and value functions converge to those of the classical SBP at a linear rate. Our analysis builds on Doob's h-transform representations, the stability results of Schrödinger potentials, Gamma-convergence, and a novel fixed-point argument that couples an optimization problem over the space of measures with an auxiliary entropic optimal transport problem. These results not only provide the first quantitative convergence guarantees for soft-constrained bridges but also shed light on how penalty regularization enables robust generative modeling, fine-tuning, and transfer learning.
- North America > United States > California > Los Angeles County > Los Angeles (0.28)
- North America > United States > California > Santa Barbara County > Santa Barbara (0.14)
- North America > United States > Rhode Island > Providence County > Providence (0.04)
- (2 more...)
- North America > Canada > Ontario > Toronto (0.14)
- North America > Canada > Quebec > Montreal (0.04)
- Asia (0.04)
Instance-Dependent Continuous-Time Reinforcement Learning via Maximum Likelihood Estimation
Zhao, Runze, Yu, Yue, Wang, Ruhan, Huang, Chunfeng, Zhou, Dongruo
Continuous-time reinforcement learning (CTRL) provides a natural framework for sequential decision-making in dynamic environments where interactions evolve continuously over time. While CTRL has shown growing empirical success, its ability to adapt to varying levels of problem difficulty remains poorly understood. In this work, we investigate the instance-dependent behavior of CTRL and introduce a simple, model-based algorithm built on maximum likelihood estimation (MLE) with a general function approximator. Unlike existing approaches that estimate system dynamics directly, our method estimates the state marginal density to guide learning. We establish instance-dependent performance guarantees by deriving a regret bound that scales with the total reward variance and measurement resolution. Notably, the regret becomes independent of the specific measurement strategy when the observation frequency adapts appropriately to the problem's complexity. To further improve performance, our algorithm incorporates a randomized measurement schedule that enhances sample efficiency without increasing measurement cost. These results highlight a new direction for designing CTRL algorithms that automatically adjust their learning behavior based on the underlying difficulty of the environment.
- North America > United States > Indiana > Monroe County > Bloomington (0.04)
- North America > United States > Washington > King County > Seattle (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
From Generative to Episodic: Sample-Efficient Replicable Reinforcement Learning
Hopkins, Max, Liu, Sihan, Ye, Christopher, Yoshida, Yuichi
The epidemic failure of replicability across empirical science and machine learning has recently motivated the formal study of replicable learning algorithms [Impagliazzo et al. (2022)]. In batch settings where data comes from a fixed i.i.d. source (e.g., hypothesis testing, supervised learning), the design of data-efficient replicable algorithms is now more or less understood. In contrast, there remain significant gaps in our knowledge for control settings like reinforcement learning where an agent must interact directly with a shifting environment. Karbasi et. al show that with access to a generative model of an environment with $S$ states and $A$ actions (the RL 'batch setting'), replicably learning a near-optimal policy costs only $\tilde{O}(S^2A^2)$ samples. On the other hand, the best upper bound without a generative model jumps to $\tilde{O}(S^7 A^7)$ [Eaton et al. (2024)] due to the substantial difficulty of environment exploration. This gap raises a key question in the broader theory of replicability: Is replicable exploration inherently more expensive than batch learning? Is sample-efficient replicable RL even possible? In this work, we (nearly) resolve this problem (for low-horizon tabular MDPs): exploration is not a significant barrier to replicable learning! Our main result is a replicable RL algorithm on $\tilde{O}(S^2A)$ samples, bridging the gap between the generative and episodic settings. We complement this with a matching $\tildeΩ(S^2A)$ lower bound in the generative setting (under the common parallel sampling assumption) and an unconditional lower bound in the episodic setting of $\tildeΩ(S^2)$ showcasing the near-optimality of our algorithm with respect to the state space $S$.
- North America > United States > Washington > King County > Seattle (0.04)
- North America > United States > California > San Diego County > San Diego (0.04)
- Europe > United Kingdom > England > Greater London > London (0.04)
- (2 more...)
- Workflow (0.92)
- Research Report (0.64)
UIPE: Enhancing LLM Unlearning by Removing Knowledge Related to Forgetting Targets
Wang, Wenyu, Zhang, Mengqi, Ye, Xiaotian, Ren, Zhaochun, Chen, Zhumin, Ren, Pengjie
Large Language Models (LLMs) inevitably acquire harmful information during training on massive datasets. LLM unlearning aims to eliminate the influence of such harmful information while maintaining the model's overall performance. Existing unlearning methods, represented by gradient ascent-based approaches, primarily focus on forgetting target data while overlooking the crucial impact of logically related knowledge on the effectiveness of unlearning. In this paper, through both theoretical and experimental analyses, we first demonstrate that a key reason for the suboptimal unlearning performance is that models can reconstruct the target content through reasoning with logically related knowledge. To address this issue, we propose Unlearning Improvement via Parameter Extrapolation (UIPE), a method that removes knowledge highly correlated with the forgetting targets. Experimental results show that UIPE significantly enhances the performance of various mainstream LLM unlearning methods on the TOFU benchmark.
- Information Technology > Security & Privacy (1.00)
- Health & Medicine (0.69)
Reference-Steering via Data-Driven Predictive Control for Hyper-Accurate Robotic Flying-Hopping Locomotion
Zeng, Yicheng, Huang, Yuhao, Xiong, Xiaobin
State-of-the-art model-based control designs have been shown to be successful in realizing dynamic locomotion behaviors for robotic systems. The precision of the realized behaviors in terms of locomotion performance via fly, hopping, or walking has not yet been well investigated, despite the fact that the difference between the robot model and physical hardware is doomed to produce inaccurate trajectory tracking. To address this inaccuracy, we propose a referencing-steering method to bridge the model-to-real gap by establishing a data-driven input-output (DD-IO) model on top of the existing model-based design. The DD-IO model takes the reference tracking trajectories as the input and the realized tracking trajectory as the output. By utilizing data-driven predictive control, we steer the reference input trajectories online so that the realized output ones match the actual desired ones. We demonstrate our method on the robot PogoX to realize hyper-accurate hopping and flying behaviors in both simulation and hardware. This data-driven reference-steering approach is straightforward to apply to general robotic systems for performance improvement via hyper-accurate trajectory tracking.
Achieving O(1/N) Optimality Gap in Restless Bandits through Diffusion Approximation
Yan, Chen, Wang, Weina, Ying, Lei
The Restless Multi-Armed Bandit (RMAB) problem is a fundamental framework in decision theory and operations research, where a decision maker must choose which among multiple tasks (arms) to work on (pull) at each time step in order to maximize cumulative reward [24]. Unlike the classic bandit problem [14], in the restless variant, the state of each arm evolves stochastically regardless of whether it is pulled. This problem has gained significant attention due to its applicability in various domains where optimal decision-making under uncertainty is critical, such as machine maintenance [11], target tracking [17], network communication [18] and clinic trials [22], to name a few. Despite its relevance, the RMAB problem is known to be PSPACE-hard [19], and finding optimal policies is computationally challenging, especially when the number of arms N is large. In this paper, we focus on the finite horizon version of the RMAB problem with N homogeneous arms and horizon H, where each arm follows the same (time-dependent) state transition and reward function. While computing the exact optimal policy is impractical, the homogeneity of the model allows for the design of efficient heuristic policies. One such class of heuristics is based on fluid approximation, which transforms the original N-armed RMAB problem into a Linear Program (LP).
- North America > United States > Pennsylvania > Allegheny County > Pittsburgh (0.04)
- North America > United States > Michigan > Washtenaw County > Ann Arbor (0.04)
- North America > United States > Massachusetts > Middlesex County > Belmont (0.04)
- (2 more...)
A Universal Multi-Vehicle Cooperative Decision-Making Approach in Structured Roads by Mixed-Integer Potential Game
Meng, Chengzhen, Huang, Zhenmin, Ma, Jun
Due to the intricate of real-world road topologies and the inherent complexity of autonomous vehicles, cooperative decision-making for multiple connected autonomous vehicles (CAVs) remains a significant challenge. Currently, most methods are tailored to specific scenarios, and the efficiency of existing optimization and learning methods applicable to diverse scenarios is hindered by the complexity of modeling and data dependency, which limit their real-world applicability. To address these issues, this paper proposes a universal multi-vehicle cooperative decision-making method in structured roads with game theory. We transform the decision-making problem into a graph path searching problem within a way-point graph framework. The problem is formulated as a mixed-integer linear programming problem (MILP) first and transformed into a mixed-integer potential game (MIPG), which reduces the scope of problem and ensures that no player needs to sacrifice for the overall cost. Two Gauss-Seidel algorithms for cooperative decision-making are presented to solve the MIPG problem and obtain the Nash equilibrium solutions. Specifically, the sequential Gauss-Seidel algorithm for cooperative decision-making considers the varying degrees of CAV interactions and flexibility in adjustment strategies to determine optimization priorities, which reduces the frequency of ineffective optimizations. Experimental evaluations across various urban traffic scenarios with different topological structures demonstrate the effectiveness and efficiency of the proposed method compared with MILP and comparisons of different optimization sequences validate the efficiency of the sequential Gauss-Seidel algorithm for cooperative decision-making.
- Transportation > Ground > Road (1.00)
- Automobiles & Trucks (0.68)
New Directions in Text Classification Research: Maximizing The Performance of Sentiment Classification from Limited Data
Agustian, Surya, Syah, Muhammad Irfan, Fatiara, Nurul, Abdillah, Rahmad
The stakeholders' needs in sentiment analysis for various issues, whether positive or negative, are speed and accuracy. One new challenge in sentiment analysis tasks is the limited training data, which often leads to suboptimal machine learning models and poor performance on test data. This paper discusses the problem of text classification based on limited training data (300 to 600 samples) into three classes: positive, negative, and neutral. A benchmark dataset is provided for training and testing data on the issue of Kaesang Pangarep's appointment as Chairman of PSI. External data for aggregation and augmentation purposes are provided, consisting of two datasets: the topic of Covid Vaccination sentiment and an open topic. The official score used is the F1-score, which balances precision and recall among the three classes, positive, negative, and neutral. A baseline score is provided as a reference for researchers for unoptimized classification methods. The optimized score is provided as a reference for the target score to be achieved by any proposed method. Both scoring (baseline and optimized) use the SVM method, which is widely reported as the state-of-the-art in conventional machine learning methods. The F1-scores achieved by the baseline and optimized methods are 40.83% and 51.28%, respectively.
- North America > United States (0.05)
- Asia > Indonesia > Sumatra > Riau (0.04)
- Information Technology > Artificial Intelligence > Natural Language > Text Classification (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Information Extraction (0.91)
- Information Technology > Artificial Intelligence > Natural Language > Discourse & Dialogue (0.91)
- Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.34)