Goto

Collaborating Authors

 Education


Learning Action and Reasoning-Centric Image Editing from Videos and Simulations

Neural Information Processing Systems

Object, attribute or stylistic changes can be learned from visually static datasets. On the other hand, high-quality data for action and reasoning-centric edits is scarce and has to come from entirely different sources that cover e.g.




Generalized Robust Adaptive-Bandwidth Multi-View Manifold Learning in High Dimensions with Noise

arXiv.org Machine Learning

Multiview datasets are common in scientific and engineering applications, yet existing fusion methods offer limited theoretical guarantees, particularly in the presence of heterogeneous and high-dimensional noise. We propose Generalized Robust Adaptive-Bandwidth Multiview Diffusion Maps (GRAB-MDM), a new kernel-based diffusion geometry framework for integrating multiple noisy data sources. The key innovation of GRAB-MDM is a {view}-dependent bandwidth selection strategy that adapts to the geometry and noise level of each view, enabling a stable and principled construction of multiview diffusion operators. Under a common-manifold model, we establish asymptotic convergence results and show that the adaptive bandwidths lead to provably robust recovery of the shared intrinsic structure, even when noise levels and sensor dimensions differ across views. Numerical experiments demonstrate that GRAB-MDM significantly improves robustness and embedding quality compared with fixed-bandwidth and equal-bandwidth baselines, and usually outperform existing algorithms. The proposed framework offers a practical and theoretically grounded solution for multiview sensor fusion in high-dimensional noisy environments.


OSIL: Learning Offline Safe Imitation Policies with Safety Inferred from Non-preferred Trajectories

arXiv.org Machine Learning

This work addresses the problem of offline safe imitation learning (IL), where the goal is to learn safe and reward-maximizing policies from demonstrations that do not have per-timestep safety cost or reward information. In many real-world domains, online learning in the environment can be risky, and specifying accurate safety costs can be difficult. However, it is often feasible to collect trajectories that reflect undesirable or unsafe behavior, implicitly conveying what the agent should avoid. We refer to these as non-preferred trajectories. We propose a novel offline safe IL algorithm, OSIL, that infers safety from non-preferred demonstrations. We formulate safe policy learning as a Constrained Markov Decision Process (CMDP). Instead of relying on explicit safety cost and reward annotations, OSIL reformulates the CMDP problem by deriving a lower bound on reward maximizing objective and learning a cost model that estimates the likelihood of non-preferred behavior. Our approach allows agents to learn safe and reward-maximizing behavior entirely from offline demonstrations. We empirically demonstrate that our approach can learn safer policies that satisfy cost constraints without degrading the reward performance, thus outperforming several baselines.





Gradient-Variation Online Learning under Generalized Smoothness

Neural Information Processing Systems

Gradient-variation online learning aims to achieve regret guarantees that scale with variations in the gradients of online functions, which is crucial for attaining fast convergence in games and robustness in stochastic o ptimization, hence receiving increased attention. Existing results often req uire the smoothness condition by imposing a fixed bound on gradient Lipschitzness, w hich may be unrealistic in practice. Recent efforts in neural network optim ization suggest a generalized smoothness condition, allowing smoothness to correlate with gradient norms. In this paper, we systematically study gradient-var iation online learning under generalized smoothness. We extend the classic optimi stic mirror descent algorithm to derive gradient-variation regret by analyzin g stability over the optimization trajectory and exploiting smoothness locally. Th en, we explore universal online learning, designing a single algorithm with the optimal gradient-va riation regrets for convex and strongly convex functions simultane ously, without requiring prior knowledge of curvature. This algorithm adopts a tw o-layer structure with a meta-algorithm running over a group of base-learners . To ensure favorable guarantees, we design a new Lipschitz-adaptive meta-a lgorithm, capable of handling potentially unbounded gradients while ensuring a second-order bound to effectively ensemble the base-learners. Finally, we provi de the applications for fast-rate convergence in games and stochastic extended adv ersarial optimization.