Goto

Collaborating Authors

Cascade of phase transitions in the training of Energy-based models

Neural Information Processing Systems

In this paper, we investigate the feature encoding process in a prototypical energybased generative model, the Restricted Boltzmann Machine (RBM). We start with an analytical investigation using simplified architectures and data structures, and end with numerical analysis of real trainings on real datasets. Our study tracks the evolution of the model's weight matrix through its singular value decomposition, revealing a series of phase transitions associated to a progressive learning of the principal modes of the empirical probability distribution. The model first learns the center of mass of the modes and then progressively resolve all modes through a cascade of phase transitions. We first describe this process analytically in a controlled setup that allows us to study analytically the training dynamics.


ExpO

Neural Information Processing Systems

Most of the work on interpretable machine learning has focused on designing either inherently interpretable models, which typically trade-off accuracy for interpretability, or post-hoc explanation systems, whose explanation quality can be unpredictable.


Selective Explanations

Neural Information Processing Systems

Feature attribution methods explain black-box machine learning (ML) models by assigning importance scores to input features. These methods can be computationally expensive for large ML models. To address this challenge, there have been increasing efforts to develop amortized explainers, where a ML model is trained to efficiently approximate computationally expensive feature attribution scores. Despite their efficiency, amortized explainers can produce misleading explanations. In this paper, we propose selective explanations to (i) detect when amortized explainers generate inaccurate explanations and (ii) improve the approximation of the explanation using a technique we call explanations with initial guess. Selective explanations allow practitioners to specify the fraction of samples that receive explanations with initial guess, offering a principled way to bridge the gap between amortized explainers (one inference) and more computationally costly approximations (multiple inferences). Our experiments on various models and datasets demonstrate that feature attributions via selective explanations strike a favorable balance between explanation quality and computational efficiency.


Dual Lagrangian Learning for Conic Optimization

Neural Information Processing Systems

This paper presents Dual Lagrangian Learning (DLL), a principled learning methodology for dual conic optimization proxies. DLL leverages conic duality and the representation power of ML models to provide high-duality, dual-feasible solutions, and therefore valid Lagrangian dual bounds, for linear and nonlinear conic optimization problems. The paper introduces a systematic dual completion procedure, differentiable conic projection layers, and a self-supervised learning framework based on Lagrangian duality. It also provides closed-form dual completion formulae for broad classes of conic problems, which eliminate the need for costly implicit layers. The effectiveness of DLL is demonstrated on linear and nonlinear conic optimization problems. The proposed methodology significantly outperforms a state-of-the-art learning-based method, and achieves 1000x speedups over commercial interior-point solvers with optimality gaps under 0.5% on average.


Empowering Active Learning for 3D Molecular Graphs with Geometric Graph Isomorphism Lu Wei

Neural Information Processing Systems

Molecular learning is pivotal in many real-world applications, such as drug discovery. Supervised learning requires heavy human annotation, which is particularly challenging for molecular data; e.g., the commonly used density functional theory (DFT) is highly computationally expensive. Active learning (AL) automatically queries labels for the most informative samples, thereby remarkably alleviating the annotation hurdle. In this paper, we present a principled AL paradigm for molecular learning, where we treat molecules as 3D molecular graphs. Specifically, we propose a new diversity sampling method to eliminate mutual redundancy built on distributions of 3D geometries.


NeurIPS_wappendix

Neural Information Processing Systems

A.1 Gaussian Process The generative model of a GP is given as I) which defines the prior over the latent functions and the likelihood of the data, respectively. The generative model implies that the joint posterior over the latent parameters given as p(f|y,X)= p(y|f)p(f|X) . The parameter of interest is f, and the hyperparameters are nuisance parameters, 2. The parameters of interest are both f and the hyperparameters, 3. The parameters of interest are the hyperparameters, and the parameter f is a nuisance parameter. Since the f is given by the outer expectation and y is dependent on f in the inner expectation, the second term is independent of x. For the d-dimensional simulators, an FBGP with an ARD RBF kernel has d +1 hyperparameters, and thus we cannot plot the joint posterior for simulators with more than one input dimension.


NeurIPS_wappendix

Neural Information Processing Systems

The bias-variance trade-off is a well-known problem in machine learning that only gets more pronounced the less available data there is. In active learning, where labeled data is scarce or difficult to obtain, neglecting this trade-off can cause inefficient and non-optimal querying, leading to unnecessary data labeling. In this paper, we focus on active learning with Gaussian Processes (GPs). For the GP, the bias-variance trade-off is made by optimization of the two hyperparameters: the length scale and noise-term. Considering that the optimal mode of the joint posterior of the hyperparameters is equivalent to the optimal bias-variance trade-off, we approximate this joint posterior and utilize it to design two new acquisition functions. The first is a Bayesian variant of Query-by-Committee (B-QBC), and the second is an extension that explicitly minimizes the predictive variance through a Query by Mixture of Gaussian Processes (QB-MGP) formulation. Across six simulators, we empirically show that B-QBC, on average, achieves the best marginal likelihood, whereas QB-MGP achieves the best predictive performance. We show that incorporating the bias-variance trade-off in the acquisition functions mitigates unnecessary and expensive data labeling.



76cf99d3614e23eabab16fb27e944bf9-AuthorFeedback.pdf

Neural Information Processing Systems

We thank all reviewers for the valuable comments. Sufficient to explore the entire search space? The proportion of explored architectures is decided by how many architectures are covered by the search process. We count how many different architectures are covered in the 50 epochs. This shows that there is not a significant difference.


DreamClear: High-Capacity Real-World Image Restoration with Privacy-Safe Dataset Curation Yuang Ai, Xiaoqiang Zhou, Huaibo Huang,, Xiaotian Han

Neural Information Processing Systems

Image restoration (IR) in real-world scenarios presents significant challenges due to the lack of high-capacity models and comprehensive datasets. To tackle these issues, we present a dual strategy: GenIR, an innovative data curation pipeline, and DreamClear, a cutting-edge Diffusion Transformer (DiT)-based image restoration model. GenIR, our pioneering contribution, is a dual-prompt learning pipeline that overcomes the limitations of existing datasets, which typically comprise only a few thousand images and thus offer limited generalizability for larger models. GenIR streamlines the process into three stages: image-text pair construction, dual-prompt based fine-tuning, and data generation & filtering.