AITopics | Technology

A Contour Stochastic Gradient Langevin Dynamics Algorithm for Simulations of Multi-modal Distributions

Neural Information Processing SystemsMay-25-2025, 23:16:58 GMT

We propose an adaptively weighted stochastic gradient Langevin dynamics algorithm (SGLD), so-called contour stochastic gradient Langevin dynamics (CSGLD), for Bayesian learning in big data statistics. The proposed algorithm is essentially a scalable dynamic importance sampler, which automatically flattens the target distribution such that the simulation for a multi-modal distribution can be greatly facilitated. Theoretically, we prove a stability condition and establish the asymptotic convergence of the self-adapting parameter to a unique fixed-point, regardless of the non-convexity of the original energy function; we also present an error analysis for the weighted averaging estimators. Empirically, the CSGLD algorithm is tested on multiple benchmark datasets including CIFAR10 and CIFAR100. The numerical results indicate its superiority over the existing state-of-the-art algorithms in training deep neural networks.

Add feedback

b5b8c484824d8a06f4f3d570bc420313-AuthorFeedback.pdf

Neural Information Processing SystemsMay-25-2025, 23:16:46 GMT

We thank all the reviewers for the valuable comments. Advantages of CSGLD over M-SGD: (i) CSGLD belongs to the class of adaptive biasing force algorithms and Empirically, we suggest to partition the sample space into a moderate number of subregions, e.g. Drawbacks of simulated annealing (SA) and replica exchange SGLD (reSGLD)/parallel tempering: SA can only be Q2. Missing baselines: We further compared CSGLD with CyclicalSGLD and reSGLD on an asymmetric mixture We will include the baselines and references in the next version. The gradient-vanishing problem in SGLD is not clear: Please refer to our reply to Q1 of Reviewer 1. Q1. Comments on bizarre peaks: A bizarre peak always indicates that there is a local minimum of the same energy in Q3.

artificial intelligence, csgld, machine learning, (16 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.51)

Add feedback

Differentiable Simulation of Soft Multi-body Systems Yi-Ling Qiao University of Maryland, College Park University of Maryland, College Park Vladlen Koltun

Neural Information Processing SystemsMay-25-2025, 23:16:23 GMT

We present a method for differentiable simulation of soft articulated bodies. Our work enables the integration of differentiable physical dynamics into gradient-based pipelines. We develop a top-down matrix assembly algorithm within Projective Dynamics and derive a generalized dry friction model for soft continuum using a new matrix splitting strategy. We derive a differentiable control framework for soft articulated bodies driven by muscles, joint torques, or pneumatic tubes. The experiments demonstrate that our designs make soft body simulation more stable and realistic compared to other frameworks. Our method accelerates the solution of system identification problems by more than an order of magnitude, and enables efficient gradient-based learning of motion control with soft robots.

machine learning, reinforcement learning, simulation, (17 more...)

Neural Information Processing Systems

Country: North America > United States > Maryland > Prince George's County > College Park (0.76)

Industry: Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Robots (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (0.46)

Add feedback

Suppress Content Shift: Better Diffusion Features via Off-the-Shelf Generation Techniques Zitai Wang 3 Zhiyong Yang

Neural Information Processing SystemsMay-25-2025, 23:08:16 GMT

Diffusion models are powerful generative models, and this capability can also be applied to discrimination. The inner activations of a pre-trained diffusion model can serve as features for discriminative tasks, namely, diffusion feature. We discover that diffusion feature has been hindered by a hidden yet universal phenomenon that we call content shift. To be specific, there are content differences between features and the input image, such as the exact shape of a certain object. We locate the cause of content shift as one inherent characteristic of diffusion models, which suggests the broad existence of this phenomenon in diffusion feature.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: Asia > China (0.14)

Genre: Research Report > Experimental Study (0.93)

Industry: Information Technology (0.67)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Add feedback

9f60ab2b55468f104055b16df8f69e81-Paper.pdf

Neural Information Processing SystemsMay-25-2025, 23:07:39 GMT

artificial intelligence, machine learning, oce risk, (13 more...)

Neural Information Processing Systems

Country: North America > Canada > Ontario > Toronto (0.14)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)
Information Technology > Data Science > Data Mining (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

Supplementary Material to Linear Disentangled Representations and Unsupervised Action Estimation

Neural Information Processing SystemsMay-25-2025, 23:04:43 GMT

When we predict post-action latent codes through a linear combination of representations, we lose the guarantee that the gradient will point towards this solution. Since reinforce applies solely one representation exactly once, we are guaranteed that (if the policy is accurate and the latent structure is amenable) the gradient will point towards this solution. We find that the cyclic representation error ||ˆα α|| = 0.157 is far worse than the 0.012 error of RGrVAE. Furthermore, the independence score is 0.830

artificial intelligence, machine learning, representation, (17 more...)

Neural Information Processing Systems

Country: North America > Canada (0.14)

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.71)

Add feedback

Supplementary Material Estimation of Conditional Moment Models Contents

Neural Information Processing SystemsMay-25-2025, 22:59:07 GMT

The most prevalent approach for estimating endogenous regression models with instruments is assuming low-dimensional linear relationships, i.e. h The coefficient in the final regression is taken to be the estimate of . Then a 2SLS estimation method is applied on these transformed feature spaces. The authors show asymptotic consistency of the resulting estimator, assuming that the approximation error goes to zero. Subsequently, they also estimate the function m(z) =E[y h(x) | z] based on another growing sieve. Though it may seem at first that the approach in that paper and ours are quite distinct, the population limit of our objective function coincides with theirs. To see this, consider the simplified version of our estimator presented in (6), where the function classes are already norm-constrained and no norm based regularization is imposed. Moreover, for a moment consider the population version of this estimator, i.e. min max (h, f) kfk Thus in the population limit and without norm regularization on the test function f, our criterion is equivalent to the minimum distance criterion analyzed in Chen and Pouzo [2012]. Another point of similarity is that we prove convergence of the estimator in terms of the pseudo-metric, the projected MSE defined in Section 4 of Chen and Pouzo [2012] - and like that paper we require additional conditions to relate the pseudo-metric to the true MSE. The present paper differs in a number of ways: (i) the finite sample criterion is different; (ii) we prove our results using localized Rademacher analysis which allows for weaker assumptions; (iii) we consider a broader range of estimation approaches than linear sieves, necessitating more of a focus on optimization. Digging into the second point, Chen and Pouzo [2012] take a more traditional parameter recovery approach which requires several minimum eigenvalue conditions and several regularity conditions to be satisfied for their estimation rate to hold (see e.g. This is analogous to a mean squared error proof in an exogenous linear regression setting, that requires the minimum eigenvalue of the feature co-variance to be bounded away from zero. Moreover, such parameter recovery methods seem limited to the growing sieve approach, since only then one has a clear finite dimensional parameter vector to work on for each fixed n.

algorithm, artificial intelligence, machine learning, (19 more...)

Neural Information Processing Systems

Genre: Research Report > New Finding (0.54)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.86)

Add feedback

Proximal Mapping for Deep Regularization

Neural Information Processing SystemsMay-25-2025, 22:57:03 GMT

Underpinning the success of deep learning is effective regularizations that allow a variety of priors in data to be modeled.

artificial intelligence, machine learning, natural language, (14 more...)

Neural Information Processing Systems

Country: North America > United States > Illinois (0.14)

Industry:

Health & Medicine > Therapeutic Area (0.67)
Energy > Oil & Gas (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

Add feedback

Global Convergence of Online Optimization for Nonlinear Model Predictive Control

Neural Information Processing SystemsMay-25-2025, 22:56:08 GMT

We study a real-time iteration (RTI) scheme for solving online optimization problem appeared in nonlinear optimal control. The proposed RTI scheme modifies the existing RTI-based model predictive control (MPC) algorithm, by selecting the stepsize of each Newton step at each sampling time using a differentiable exact augmented Lagrangian. The scheme can adaptively select the penalty parameters of augmented Lagrangian on the fly, which are shown to be stabilized after certain time periods. We prove under generic assumptions that, by involving stepsize selection instead of always using a full Newton step (like what most of the existing RTIs do), the scheme converges globally: for any initial point, the KKT residuals of the subproblems converge to zero. A key step is to show that augmented Lagrangian keeps decreasing as horizon moves forward. We demonstrate the global convergence behavior of the proposed RTI scheme in a numerical experiment.

artificial intelligence, optimization problem, subproblem, (13 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry: Energy > Oil & Gas > Upstream (0.62)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback