Well File:



E2ENet: Dynamic Sparse Feature Fusion for Accurate and Efficient 3D Medical Image Segmentation

Neural Information Processing Systems

Deep neural networks have evolved as the leading approach in 3D medical image segmentation due to their outstanding performance. However, the ever-increasing model size and computational cost of deep neural networks have become the primary barriers to deploying them on real-world, resource-limited hardware. To achieve both segmentation accuracy and efficiency, we propose a 3D medical image segmentation model called Efficient to Efficient Network (E2ENet), which incorporates two parametrically and computationally efficient designs.


Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis Hongyu Sun 1,2 Yongcai Wang 1 Wang Chen 1

Neural Information Processing Systems

This paper investigates the 3D domain generalization (3DDG) ability of large 3D models based on prevalent prompt learning. Recent works demonstrate the performances of 3D point cloud recognition can be boosted remarkably by parameterefficient prompt tuning. However, we observe that the improvement on downstream tasks comes at the expense of a severe drop in 3D domain generalization. To resolve this challenge, we present a comprehensive regulation framework that allows the learnable prompts to actively interact with the well-learned general knowledge in large 3D models to maintain good generalization. Specifically, the proposed framework imposes multiple explicit constraints on the prompt learning trajectory by maximizing the mutual agreement between task-specific predictions and task-agnostic knowledge.


A Additional Related Work

Neural Information Processing Systems

A.1 Per-Instance Search Once the neural network is trained over a collection of problem instances, per-instance fine-tuning can be used to improve the quality of solutions via local search. For DRL solvers, Bello et al. [7] fine-tuned the policy network on each test graph, which is referred as active search. Hottung et al. [28] proposed three active search strategies for efficient updating of parameter subsets during search. Hottung et al. [27] performed per-instance search in a differentiable continuous space encoded by a conditional variational auto-encoder [39]. With a heatmap indicating the promising parts of the search space, discrete solutions can be found via beam search [31], sampling [42], guided tree-search [48], dynamic programming [44], and Monte Carlo Tree Search (MCTS) [19].



Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models

Neural Information Processing Systems

Large language models (LLMs) based on decoder-only transformers have demonstrated superior text understanding capabilities compared to CLIP and T5-series models. However, the paradigm for utilizing current advanced LLMs in textto-image diffusion models remains to be explored. We observed an unusual phenomenon: directly using a large language model as the prompt encoder significantly degrades the prompt-following ability in image generation. We identified two main obstacles behind this issue. One is the misalignment between the next token prediction training in LLM and the requirement for discriminative prompt features in diffusion models.


Principled Probabilistic Imaging using Diffusion Models as Plug-and-Play Priors

Neural Information Processing Systems

Diffusion models (DMs) have recently shown outstanding capabilities in modeling complex image distributions, making them expressive image priors for solving Bayesian inverse problems. However, most existing DM-based methods rely on approximations in the generative process to be generic to different inverse problems, leading to inaccurate sample distributions that deviate from the target posterior defined within the Bayesian framework. To harness the generative power of DMs while avoiding such approximations, we propose a Markov chain Monte Carlo algorithm that performs posterior sampling for general inverse problems by reducing it to sampling the posterior of a Gaussian denoising problem. Crucially, we leverage a general DM formulation as a unified interface that allows for rigorously solving the denoising problem with a range of state-of-the-art DMs. We demonstrate the effectiveness of the proposed method on six inverse problems (three linear and three nonlinear), including a real-world black hole imaging problem. Experimental results indicate that our proposed method offers more accurate reconstructions and posterior estimation compared to existing DM-based imaging inverse methods.


A Amortizing the learning of weight vector by introducing a weight net

Neural Information Processing Systems

To integrate our proposed method with deep learning frameworks, we adopt a stochastic setting, i.e., a mini-batch setting at each iteration. We adopt two-stage learning, where stage 1 trains the model f(θ) by the standard cross-entropy loss on the imbalanced training set and stage 2 aims to learn the weight vector w and meanwhile continue to update the model f(θ). Generally, at stage 2, calculating the optimal θ and w requires two nested loops of optimization, which is cost-expensive. We optimize θ and w alternatively, corresponding to (1) and (10) respectively, where w is maintained and updated throughout the training, so that re-estimation from scratch can be avoided in each iteration. We summarize the amortized learning of w in Algorithm 2, where the key steps are highlighted in Step (a), (b), and (c).


Sigmoid Gating is More Sample Efficient than Softmax Gating in Mixture of Experts Huy Nguyen Nhat Ho Department of Statistics and Data Sciences, The University of Texas at Austin

Neural Information Processing Systems

The softmax gating function is arguably the most popular choice in mixture of experts modeling. Despite its widespread use in practice, the softmax gating may lead to unnecessary competition among experts, potentially causing the undesirable phenomenon of representation collapse due to its inherent structure. In response, the sigmoid gating function has been recently proposed as an alternative and has been demonstrated empirically to achieve superior performance. However, a rigorous examination of the sigmoid gating function is lacking in current literature. In this paper, we verify theoretically that the sigmoid gating, in fact, enjoys a higher sample efficiency than the softmax gating for the statistical task of expert estimation. Towards that goal, we consider a regression framework in which the unknown regression function is modeled as a mixture of experts, and study the rates of convergence of the least squares estimator under the over-specified case in which the number of fitted experts is larger than the true value. We show that two gating regimes naturally arise and, in each of them, we formulate an identifiability condition for the expert functions and derive the corresponding convergence rates. In both cases, we find that experts formulated as feed-forward networks with commonly used activation such as ReLU and GELU enjoy faster convergence rates under the sigmoid gating than those under softmax gating. Furthermore, given the same choice of experts, we demonstrate that the sigmoid gating function requires a smaller sample size than its softmax counterpart to attain the same error of expert estimation and, therefore, is more sample efficient.