growth condition
The Geometry of Efficient Nonconvex Sampling
Vempala, Santosh S., Wibisono, Andre
We present an efficient algorithm for uniformly sampling from an arbitrary compact body $\mathcal{X} \subset \mathbb{R}^n$ from a warm start under isoperimetry and a natural volume growth condition. Our result provides a substantial common generalization of known results for convex bodies and star-shaped bodies. The complexity of the algorithm is polynomial in the dimension, the Poincaré constant of the uniform distribution on $\mathcal{X}$ and the volume growth constant of the set $\mathcal{X}$.
Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions
Single-call stochastic extragradient methods, like stochastic past extragradient (SPEG) and stochastic optimistic gradient (SOG), have gained a lot of interest in recent years and are one of the most efficient algorithms for solving large-scale min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, despite their undoubted popularity, current convergence analyses of SPEG and SOG require strong assumptions like bounded variance or growth conditions. In addition, several important questions regarding the convergence properties of these methods are still open, including mini-batching, efficient step-size selection, and convergence guarantees under different sampling strategies. In this work, we address these questions and provide convergence guarantees for two large classes of structured non-monotone VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs). We introduce the expected residual condition, explain its benefits, and show how it allows us to obtain a strictly weaker bound than previously used growth conditions, expected co-coercivity, or bounded variance assumptions. Finally, our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.
Single-Call Stochastic Extragradient Methods for Structured Non-monotone Variational Inequalities: Improved Analysis under Weaker Conditions
Single-call stochastic extragradient methods, like stochastic past extragradient (SPEG) and stochastic optimistic gradient (SOG), have gained a lot of interest in recent years and are one of the most efficient algorithms for solving large-scale min-max optimization and variational inequalities problems (VIP) appearing in various machine learning tasks. However, despite their undoubted popularity, current convergence analyses of SPEG and SOG require strong assumptions like bounded variance or growth conditions. In addition, several important questions regarding the convergence properties of these methods are still open, including mini-batching, efficient step-size selection, and convergence guarantees under different sampling strategies. In this work, we address these questions and provide convergence guarantees for two large classes of structured non-monotone VIPs: (i) quasi-strongly monotone problems (a generalization of strongly monotone problems) and (ii) weak Minty variational inequalities (a generalization of monotone and Minty VIPs). We introduce the expected residual condition, explain its benefits, and show how it allows us to obtain a strictly weaker bound than previously used growth conditions, expected co-coercivity, or bounded variance assumptions. Finally, our convergence analysis holds under the arbitrary sampling paradigm, which includes importance sampling and various mini-batching strategies as special cases.
Transport Quasi-Monte Carlo
Quasi-Monte Carlo (QMC) is a powerful method for evaluating high-dimensional integrals. However, its use is typically limited to distributions where direct sampling is straightforward, such as the uniform distribution on the unit hypercube or the Gaussian distribution. For general target distributions with potentially unnormalized densities, leveraging the low-discrepancy property of QMC to improve accuracy remains challenging. We propose training a transport map to push forward the uniform distribution on the unit hypercube to approximate the target distribution. Inspired by normalizing flows, the transport map is constructed as a composition of simple, invertible transformations. To ensure that RQMC achieves its superior error rate, the transport map must satisfy specific regularity conditions. We introduce a flexible parametrization for the transport map that not only meets these conditions but is also expressive enough to model complex distributions. Our theoretical analysis establishes that the proposed transport QMC estimator achieves faster convergence rates than standard Monte Carlo, under mild and easily verifiable growth conditions on the integrand. Numerical experiments confirm the theoretical results, demonstrating the effectiveness of the proposed method in Bayesian inference tasks.
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- North America > United States > New York (0.04)
- Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
A Machine Learning Approach Capturing Hidden Parameters in Autonomous Thin-Film Deposition
Zheng, Yuanlong, Blake, Connor, Mravac, Layla, Zhang, Fengxue, Chen, Yuxin, Yang, Shuolong
The integration of machine learning and robotics into thin film deposition is transforming material discovery and optimization. However, challenges remain in achieving a fully autonomous cycle of deposition, characterization, and decision-making. Additionally, the inherent sensitivity of thin film growth to hidden parameters such as substrate conditions and chamber conditions can compromise the performance of machine learning models. In this work, we demonstrate a fully autonomous physical vapor deposition system that combines in-situ optical spectroscopy, a high-throughput robotic sample handling system, and Gaussian Process Regression models. By employing a calibration layer to account for hidden parameter variations and an active learning algorithm to optimize the exploration of the parameter space, the system fabricates silver thin films with optical reflected power ratios within 2.5% of the target in an average of 2.3 attempts. This approach significantly reduces the time and labor required for thin film deposition, showcasing the potential of machine learning-driven automation in accelerating material development.
- North America > United States > Illinois > Cook County > Chicago (0.05)
- Asia > Singapore > Central Region > Singapore (0.04)
- Asia > Middle East > Jordan (0.04)
Differentiability and Approximation of Probability Functions under Gaussian Mixture Models: A Bayesian Approach
Contador, Gonzalo, Pérez-Aros, Pedro, Vilches, Emilio
In this work, we study probability functions associated with Gaussian mixture models. Our primary focus is on extending the use of spherical radial decomposition for multivariate Gaussian random vectors to the context of Gaussian mixture models, which are not inherently spherical but only conditionally so. Specifically, the conditional probability distribution, given a random parameter of the random vector, follows a Gaussian distribution, allowing us to apply Bayesian analysis tools to the probability function. This assumption, together with spherical radial decomposition for Gaussian random vectors, enables us to represent the probability function as an integral over the Euclidean sphere. Using this representation, we establish sufficient conditions to ensure the differentiability of the probability function and provide and integral representation of its gradient. Furthermore, leveraging the Bayesian decomposition, we approximate the probability function using random sampling over the parameter space and the Euclidean sphere. Finally, we present numerical examples that illustrate the advantages of this approach over classical approximations based on random vector sampling.
- South America > Chile > Santiago Metropolitan Region > Santiago Province > Santiago (0.04)
- South America > Chile > O'Higgins Region > Cachapoal Province > Rancagua (0.04)
- North America > United States > Pennsylvania > Philadelphia County > Philadelphia (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Uncertainty > Bayesian Inference (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.93)
Autonomous, Self-driving Multi-Step Growth of Semiconductor Heterostructures Guided by Machine Learning
Shen, Chao, Zhan, Wenkang, Sun, Hongyu, Xin, Kaiyao, Xu, Bo, Wang, Zhanguo, Zhao, Chao
The semiconductor industry has prioritized automating repetitive tasks by closed-loop, autonomous experimentation which enables accelerated optimization of complex multi-step processes. The emergence of machine learning (ML) has ushered in automated process with minimal human intervention. In this work, we develop SemiEpi, a self-driving automation platform capable of executing molecular beam epitaxy (MBE) growth with multi-steps, continuous in-situ monitoring, and on-the-fly feedback control. By integrating standard hardware, homemade software, curve fitting, and multiple ML models, SemiEpi operates autonomously, eliminating the need for extensive expertise in MBE processes to achieve optimal outcomes. The platform actively learns from previous experimental results, identifying favorable conditions and proposing new experiments to achieve the desired results. We standardize and optimize growth for InAs/GaAs quantum dots (QDs) heterostructures to showcase the power of ML-guided multi-step growth. A temperature calibration was implemented to get the initial growth condition, and fine control of the process was executed using ML. Leveraging RHEED movies acquired during the growth, SemiEpi successfully identified and optimized a novel route for multi-step heterostructure growth. This work demonstrates the capabilities of closed-loop, ML-guided systems in addressing challenges in multi-step growth for any device. Our method is critical to achieve repeatable materials growth using commercially scalable tools. Our strategy facilitates the development of a hardware-independent process and enhancing process repeatability and stability, even without exhaustive knowledge of growth parameters.
- Health & Medicine (0.93)
- Information Technology (0.87)
- Energy > Renewable > Geothermal (0.54)
Faster Convergence of Stochastic Accelerated Gradient Descent under Interpolation
Mishkin, Aaron, Pilanci, Mert, Schmidt, Mark
A continuing trend in machine learning is the adoption of powerful prediction models which can exactly fit, or interpolate, their training data (Zhang et al., 2017). Methods such as over-parameterized neural networks (Zhang and Yin, 2013; Belkin et al., 2019a), kernel machines (Belkin et al., 2019b), and boosting (Schapire et al., 1997) have all been shown to achieve zero training loss in practice. This phenomena is particularly prevalent in modern deep learning, where interpolation is conjectured to be key to both optimization (Liu et al., 2022; Oymak and Soltanolkotabi, 2019) and generalization (Belkin, 2021). Recent experimental and theoretical evidence shows stochastic gradient descent(SGD) matches the fast convergence rates of deterministic gradient methods up to problemdependent constants when training interpolating models (Arora et al., 2018; Ma et al., 2018; Zou and Gu, 2019). With additional assumptions, interpolation also implies the strong (Polyak, 1987) and weak (Bassily et al., 2018; Vaswani et al., 2019) growth conditions, which bound the second moment of the stochastic gradients. Under strong/weak growth, variance-reduced algorithms typically exhibit slower convergence than stochastic gradient methods despite using more computation or memory (Defazio and Bottou, 2019; Ma et al., 2018), perhaps because these conditions already imply a form of "automatic variance reduction" (Liu et al., 2022). A combination of interpolation and growth conditions has been used to prove fast convergence rates for SGD with line-search (Vaswani et al., 2019), with the stochastic Polyak step-size (Loizou et al., 2020; Berrada et al., 2020), for mirror descent (D'Orazio et al., 2021), and for model-based methods (Asi and Duchi, 2019).
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- North America > Canada > British Columbia (0.04)
- Europe > Russia (0.04)
- (4 more...)
Controlling Moments with Kernel Stein Discrepancies
Kanagawa, Heishiro, Barp, Alessandro, Gretton, Arthur, Mackey, Lester
Kernel Stein discrepancies (KSDs) measure the quality of a distributional approximation and can be computed even when the target density has an intractable normalizing constant. Notable applications include the diagnosis of approximate MCMC samplers and goodness-of-fit tests for unnormalized statistical models. The present work analyzes the convergence control properties of KSDs. We first show that standard KSDs used for weak convergence control fail to control moment convergence. To address this limitation, we next provide sufficient conditions under which alternative diffusion KSDs control both moment and weak convergence. As an immediate consequence we develop, for each $q > 0$, the first KSDs known to exactly characterize $q$-Wasserstein convergence.
- North America > United States > California > Alameda County > Berkeley (0.14)
- Asia > Japan > Honshū > Kantō > Kanagawa Prefecture (0.04)
- North America > United States > New York (0.04)
- (7 more...)
Differentially Private Non-Convex Optimization under the KL Condition with Optimal Rates
Menart, Michael, Ullah, Enayat, Arora, Raman, Bassily, Raef, Guzmán, Cristóbal
We study private empirical risk minimization (ERM) problem for losses satisfying the $(\gamma,\kappa)$-Kurdyka-{\L}ojasiewicz (KL) condition. The Polyak-{\L}ojasiewicz (PL) condition is a special case of this condition when $\kappa=2$. Specifically, we study this problem under the constraint of $\rho$ zero-concentrated differential privacy (zCDP). When $\kappa\in[1,2]$ and the loss function is Lipschitz and smooth over a sufficiently large region, we provide a new algorithm based on variance reduced gradient descent that achieves the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$ on the excess empirical risk, where $n$ is the dataset size and $d$ is the dimension. We further show that this rate is nearly optimal. When $\kappa \geq 2$ and the loss is instead Lipschitz and weakly convex, we show it is possible to achieve the rate $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^\kappa\big)$ with a private implementation of the proximal point method. When the KL parameters are unknown, we provide a novel modification and analysis of the noisy gradient descent algorithm and show that this algorithm achieves a rate of $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{\frac{2\kappa}{4-\kappa}}\big)$ adaptively, which is nearly optimal when $\kappa = 2$. We further show that, without assuming the KL condition, the same gradient descent algorithm can achieve fast convergence to a stationary point when the gradient stays sufficiently large during the run of the algorithm. Specifically, we show that this algorithm can approximate stationary points of Lipschitz, smooth (and possibly nonconvex) objectives with rate as fast as $\tilde{O}\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)$ and never worse than $\tilde{O}\big(\big(\frac{\sqrt{d}}{n\sqrt{\rho}}\big)^{1/2}\big)$. The latter rate matches the best known rate for methods that do not rely on variance reduction.
- North America > United States > Ohio (0.04)
- North America > United States > New York > New York County > New York City (0.04)
- South America > Chile (0.04)
- (4 more...)