Goto

Collaborating Authors

 step


Leveraging Conditional Dependence for Efficient World Model Denoising

Neural Information Processing Systems

Effective denoising is critical for managing complex visual inputs contaminated with noisy distractors in model-based reinforcement learning (RL). Current methods often oversimplify the decomposition of observations by neglecting the conditional dependence between task-relevant and task-irrelevant components given an observation. To address this limitation, we introduce CsDreamer, a modelbased RL approach built upon the world model of Collider-structure Recurrent State-Space Model (CsRSSM). CsRSSM incorporates colliders to comprehensively model the denoising inference process and explicitly capture the conditional dependence. Furthermore, it employs a decoupling regularization to balance the influence of this conditional dependence. By accurately inferring a task-relevant state space, CsDreamer improves learning efficiency during rollouts. Experimental results demonstrate the effectiveness of CsRSSM in extracting task-relevant information, leading to CsDreamer outperforming existing approaches in environments characterized by complex noise interference.


Unveiling the Power of Multiple Gossip Steps: AStability-Based Generalization Analysis in Decentralized Training

Neural Information Processing Systems

Decentralized training removes the centralized server, making it a communicationefficient approach that can significantly improve training efficiency, but it often suffers from degraded performance compared to centralized training. Multi-Gossip Steps (MGS) serve as a simple yet effective bridge between decentralized and centralized training, significantly reducing experiment performance gaps. However, the theoretical reasons for its effectiveness and whether this gap can be fully eliminated by MGS remain open questions. In this paper, we derive upper bounds on the generalization error and excess error of MGS using stability analysis, systematically answering these two key questions.




DPM-Solver: A Fast ODE Solver for Diffusion Probabilistic Model Sampling in Around 10 Steps

Neural Information Processing Systems

Diffusion probabilistic models (DPMs) are emerging powerful generative models. Despite their high-quality generation performance, DPMs still suffer from their slow sampling as they generally need hundreds or thousands of sequential function evaluations (steps) of large neural networks to draw a sample. Sampling from DPMs can be viewed alternatively as solving the corresponding diffusion ordinary differential equations (ODEs). In this work, we propose an exact formulation of the solution of diffusion ODEs. The formulation analytically computes the linear part of the solution, rather than leaving all terms to black-box ODE solvers as adopted in previous works.


Watch Your Step: Learning Node Embeddings via Graph Attention

Neural Information Processing Systems

Graph embedding methods represent nodes in a continuous vector space, preserving different types of relational information from the graph. There are many hyper-parameters to these methods (e.g. the length of a random walk) which have to be manually tuned for every graph. In this paper, we replace previously fixed hyper-parameters with trainable ones that we automatically learn via backpropagation. In particular, we propose a novel attention model on the power series of the transition matrix, which guides the random walk to optimize an upstream objective. Unlike previous approaches to attention models, the method that we propose utilizes attention parameters exclusively on the data itself (e.g. on the random walk), and are not used by the model for inference. We experiment on link prediction tasks, as we aim to produce embeddings that best-preserve the graph structure, generalizing to unseen information. We improve state-of-the-art results on a comprehensive suite of real-world graph datasets including social, collaboration, and biological networks, where we observe that our graph attention model can reduce the error by up to 20\%-40\%. We show that our automatically-learned attention parameters can vary significantly per graph, and correspond to the optimal choice of hyper-parameter if we manually tune existing methods.


MomentDiff: Generative Video Moment Retrieval from Random to Real (Supplementary Material)

Neural Information Processing Systems

Each video is annotated with an average of 2.4 moments, with The dataset contains a total of 10,310 queries with 18,367 annotated moments. Then, we design the dataset Charades-ST A-Mom based on the span's end time Algorithm 1 provides the pseudo-code of MomentDiff Training in a PyTorch-like style. Inference efficiency is critical for machine learning models. We report R1@0.5, R1@0.7 and MAP Figure 1 shows the performance fluctuation of the model on the Charades-ST A dataset. Glove; SF+C, C;) to organize experiments. Therefore we adopt DDIM as the default technology.


FastDrag: Manipulate Anything in One Step

Neural Information Processing Systems

Drag-based image editing using generative models provides precise control over image contents, enabling users to manipulate anything in an image with a few clicks. However, prevailing methods typically adopt n -step iterations for latent semantic optimization to achieve drag-based image editing, which is time-consuming and limits practical applications. In this paper, we introduce a novel one-step drag-based image editing method, i.e., FastDrag, to accelerate the editing process. Central to our approach is a latent warpage function (LWF), which simulates the behavior of a stretched material to adjust the location of individual pixels within the latent space. This innovation achieves one-step latent semantic optimization and hence significantly promotes editing speeds.


Review for NeurIPS paper: The LoCA Regret: A Consistent Metric to Evaluate Model-Based Behavior in Reinforcement Learning

Neural Information Processing Systems

This paper proposes a method for identifying model-based behavior in RL agents (the "LoCA regret"), which can be used without knowing anything about the internal structure of the agent itself. This method is demonstrated to correctly distinguish between classical known model-free and model-based agents. It is also used to analyze MuZero, revealing that although MuZero is in principle a model-based algorithm, it does not make optimal use of its model. The reviewers agreed that the LoCA regret is a useful metric, and felt that doing careful evaluation of agents by designing metrics like this is an important area of research in RL. I agree, and found very interesting the demonstration that just because a particular algorithm makes use of a model, doesn't necessarily mean that the algorithm will have the properties that we think of as being associated with model-based algorithms. While there was some debate during the discussion period about some of the choices regarding the calculation of the LoCA regret (e.g.


UrbanLLM: Autonomous Urban Activity Planning and Management with Large Language Models

arXiv.org Artificial Intelligence

Location-based services play an critical role in improving the quality of our daily lives. Despite the proliferation of numerous specialized AI models within spatio-temporal context of location-based services, these models struggle to autonomously tackle problems regarding complex urban planing and management. To bridge this gap, we introduce UrbanLLM, a fine-tuned large language model (LLM) designed to tackle diverse problems in urban scenarios. UrbanLLM functions as a problem-solver by decomposing urban-related queries into manageable sub-tasks, identifying suitable spatio-temporal AI models for each sub-task, and generating comprehensive responses to the given queries. Our experimental results indicate that UrbanLLM significantly outperforms other established LLMs, such as Llama and the GPT series, in handling problems concerning complex urban activity planning and management. UrbanLLM exhibits considerable potential in enhancing the effectiveness of solving problems in urban scenarios, reducing the workload and reliance for human experts.