Plotting


Nearest Neighbor Speculative Decoding for LLM Generation and Attribution Minghan Li1

Neural Information Processing Systems

Large language models (LLMs) often hallucinate and lack the ability to provide attribution for their generations. Semi-parametric LMs, such as kNN-LM, approach these limitations by refining the output of an LM for a given prompt using its nearest neighbor matches in a non-parametric data store. However, these models often exhibit slow inference speeds and produce non-fluent texts.


Shaping_Belief_States_with_Generative_Environment_Models_for_RL

Neural Information Processing Systems

When agents interact with a complex environment, they must form and maintain beliefs about the relevant aspects of that environment. We propose a way to efficiently train expressive generative models in complex environments. We show that a predictive algorithm with an expressive generative model can form stable belief-states in visually rich and dynamic 3D environments. More precisely, we show that the learned representation captures the layout of the environment as well as the position and orientation of the agent. Our experiments show that the model substantially improves data-efficiency on a number of reinforcement learning (RL) tasks compared with strong model-free baseline agents. We find that predicting multiple steps into the future (overshooting), in combination with an expressive generative model, is critical for stable representations to emerge. In practice, using expressive generative models in RL is computationally expensive and we propose a scheme to reduce this computational burden, allowing us to build agents that are competitive with model-free baselines.


Author Response for ' Shaping Belief States with Generative Environment Models for RL '

Neural Information Processing Systems

We are grateful to all constructive and actionable feedback provided by the reviewers. We are especially thankful for the detailed feedback provided by R2 and R3. We believe to have addressed the key concerns raised by the reviewers below. We also understand R1's concerns with our main hypothesis as it has not We are working to improve our explanations in section 2.2 based on all feedback We emphasize that careful empirical experimentation in ML can also bring valuable insights to the community. Studying these factors require an intersectional empirical study such as this paper.


Interval timing in deep reinforcement learning agents

Neural Information Processing Systems

The measurement of time is central to intelligent behavior. We know that both animals and artificial agents can successfully use temporal dependencies to select actions. In artificial agents, little work has directly addressed (1) which architectural components are necessary for successful development of this ability, (2) how this timing ability comes to be represented in the units and actions of the agent, and (3) whether the resulting behavior of the system converges on solutions similar to those of biology. Here we studied interval timing abilities in deep reinforcement learning agents trained end-to-end on an interval reproduction paradigm inspired by experimental literature on mechanisms of timing. We characterize the strategies developed by recurrent and feedforward agents, which both succeed at temporal reproduction using distinct mechanisms, some of which bear specific and intriguing similarities to biological systems. These findings advance our understanding of how agents come to represent time, and they highlight the value of experimentally inspired approaches to characterizing agent abilities.


3D Gaussian Splatting as Markov Chain Monte Carlo

Neural Information Processing Systems

While 3D Gaussian Splatting has recently become popular for neural rendering, current methods rely on carefully engineered cloning and splitting strategies for placing Gaussians, which can lead to poor-quality renderings, and reliance on a good initialization. In this work, we rethink the set of 3D Gaussians as a random sample drawn from an underlying probability distribution describing the physical representation of the scene--in other words, Markov Chain Monte Carlo (MCMC) samples. Under this view, we show that the 3D Gaussian updates can be converted as Stochastic Gradient Langevin Dynamics (SGLD) update by simply introducing noise. We then rewrite the densification and pruning strategies in 3D Gaussian Splatting as simply a deterministic state transition of MCMC samples, removing these heuristics from the framework. To do so, we revise the'cloning' of Gaussians into a relocalization scheme that approximately preserves sample probability. To encourage efficient use of Gaussians, we introduce a regularizer that promotes the removal of unused Gaussians. On various standard evaluation scenes, we show that our method provides improved rendering quality, easy control over the number of Gaussians, and robustness to initialization.


Towards Dynamic Message Passing on Graphs Xiangyang Ji

Neural Information Processing Systems

Message passing plays a vital role in graph neural networks (GNNs) for effective feature learning. However, the over-reliance on input topology diminishes the efficacy of message passing and restricts the ability of GNNs. Despite efforts to mitigate the reliance, existing study encounters message-passing bottlenecks or high computational expense problems, which invokes the demands for flexible message passing with low complexity. In this paper, we propose a novel dynamic message-passing mechanism for GNNs. It projects graph nodes and learnable pseudo nodes into a common space with measurable spatial relations between them. With nodes moving in the space, their evolving relations facilitate flexible pathway construction for a dynamic message-passing process. Associating pseudo nodes to input graphs with their measured relations, graph nodes can communicate with each other intermediately through pseudo nodes under linear complexity.


af5d5ef24881f3c3049a7b9bfe74d58b-AuthorFeedback.pdf

Neural Information Processing Systems

We would like to begin by thanking all the reviewers for their hard work in providing us with such insightful feedback. Several reviewers noted that the guarantee in (9) may no longer hold post-approximation (R1, R3, R5). Furthermore, we are grateful to R3 for noting that our approximations are both intuitive and effective. We would like also to clarify the use of the indicator function in response to R1 and R2. Several reviewers recommended adding the constraint threshold values to the tables (R1, R2, R3).


Learning Non-Convergent Non-Persistent Short-Run MCMC Toward Energy-Based Model

Neural Information Processing Systems

This paper studies a curious phenomenon in learning energy-based model (EBM) using MCMC. In each learning iteration, we generate synthesized examples by running a non-convergent, non-mixing, and non-persistent short-run MCMC toward the current model, always starting from the same initial distribution such as uniform noise distribution, and always running a fixed number of MCMC steps. After generating synthesized examples, we then update the model parameters according to the maximum likelihood learning gradient, as if the synthesized examples are fair samples from the current model. We treat this non-convergent short-run MCMC as a learned generator model or a flow model. We provide arguments for treating the learned non-convergent short-run MCMC as a valid model. We show that the learned short-run MCMC is capable of generating realistic images. More interestingly, unlike traditional EBM or MCMC, the learned short-run MCMC is capable of reconstructing observed images and interpolating between images, like generator or flow models. The code can be found in the Appendix.


2bc8ae25856bc2a6a1333d1331a3b7a6-AuthorFeedback.pdf

Neural Information Processing Systems

Reply to Reviewer 2: Thank you for the insightful and comprehensive summary of our work. A1: As you have pointed out, each iteration requires computing K derivatives of the CNN. We will add such information in revision. Q2: Dynamic K. A2: Following your advice, we conducted experiments with random K [100, 120] for training. We can still learn short-run MCMC successfully.