AITopics

2405.13817

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > Canada > Ontario > Toronto (0.04)

Genre: Research Report (0.42)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.91)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.68)

Kalinin, Nikita, Lampert, Christoph

Banded Square Root Matrix Factorization for Differentially Private Model Training

arXiv.org Artificial IntelligenceMay-22-2024

Current state-of-the-art methods for differentially private model training are based on matrix factorization techniques. However, these methods suffer from high computational overhead because they require numerically solving a demanding optimization problem to determine an approximately optimal factorization prior to the actual model training. In this work, we present a new matrix factorization approach, BSR, which overcomes this computational bottleneck. By exploiting properties of the standard matrix square root, BSR allows to efficiently handle also large-scale problems. For the key scenario of stochastic gradient descent with momentum and weight decay, we even derive analytical expressions for BSR that render the computational overhead negligible. We prove bounds on the approximation quality that hold both in the centralized and in the federated learning setting. Our numerical experiments demonstrate that models trained using BSR perform on par with the best existing methods, while completely avoiding their computational overhead.

factorization, matrix, participation, (10 more...)

2405.13763

Country:

Europe > Austria (0.04)
Europe > Russia (0.04)
Asia > Russia (0.04)

Genre: Research Report > New Finding (0.46)

Industry: Information Technology > Security & Privacy (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.54)

Marion, Pierre, Chizat, Lénaïc

Deep linear networks for regression are implicitly regularized towards flat minima

arXiv.org Machine LearningMay-22-2024

The largest eigenvalue of the Hessian, or sharpness, of neural networks is a key quantity to understand their optimization dynamics. In this paper, we study the sharpness of deep linear networks for overdetermined univariate regression. Minimizers can have arbitrarily large sharpness, but not an arbitrarily small one. Indeed, we show a lower bound on the sharpness of minimizers, which grows linearly with depth. We then study the properties of the minimizer found by gradient flow, which is the limit of gradient descent with vanishing learning rate. We show an implicit regularization towards flat minima: the sharpness of the minimizer is no more than a constant times the lower bound. The constant depends on the condition number of the data covariance matrix, but not on width or depth. This result is proven both for a small-scale initialization and a residual initialization. Results of independent interest are shown in both cases. For small-scale initialization, we show that the learned weight matrices are approximately rank-one and that their singular vectors align. For residual initialization, convergence of the gradient flow for a Gaussian initialization of the residual network is proven. Numerical experiments illustrate our results and connect them to gradient descent with non-vanishing learning rate.

deep linear network, initialization, prod, (14 more...)

2405.13456

Country:

North America > United States (0.14)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > New Finding (0.66)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.55)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

arXiv.org Machine LearningMay-22-2024

Local convergence of min-max algorithms to differentiable equilibrium on Riemannian manifold

Zhang, Sixin

We study min-max algorithms to solve zero-sum differentiable games on Riemannian manifold. The notions of differentiable Stackelberg equilibrium and differentiable Nash equilibrium in Euclidean space are generalized to Riemannian manifold, through an intrinsic definition which does not depend on the choice of local coordinate chart of manifold. We then provide sufficient conditions for the local convergence of the deterministic simultaneous algorithms $\tau$-GDA and $\tau$-SGA near such equilibrium, using a general methodology based on spectral analysis. These algorithms are extended with stochastic gradients and applied to the training of Wasserstein GAN. The discriminator of GAN is constructed from Lipschitz-continuous functions based on Stiefel manifold. We show numerically how the insights obtained from the local convergence analysis may lead to an improvement of GAN models.

algorithm, convergence, local convergence, (14 more...)

2405.13392

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)
Oceania > Australia > New South Wales > Sydney (0.04)
(8 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.66)

Dette, Holger, Graw, Carina

Uncertainty quantification by block bootstrap for differentially private stochastic gradient descent

arXiv.org Machine LearningMay-21-2024

Stochastic Gradient Descent (SGD) is a widely used tool in machine learning. In the context of Differential Privacy (DP), SGD has been well studied in the last years in which the focus is mainly on convergence rates and privacy guarantees. While in the non private case, uncertainty quantification (UQ) for SGD by bootstrap has been addressed by several authors, these procedures cannot be transferred to differential privacy due to multiple queries to the private data. In this paper, we propose a novel block bootstrap for SGD under local differential privacy that is computationally tractable and does not require an adjustment of the privacy budget. The method can be easily implemented and is applicable to a broad class of estimation problems. We prove the validity of our approach and illustrate its finite sample properties by means of a simulation study. As a by-product, the new method also provides a simple alternative numerical tool for UQ for non-private SGD.

confidence interval, ldp, sgd, (13 more...)

2405.12553

Country:

Europe > Austria > Vienna (0.14)
Europe > Germany (0.04)
North America > United States > New York > New York County > New York City (0.04)
Asia > Middle East > Jordan (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (0.66)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (1.00)

arXiv.org Artificial IntelligenceMay-19-2024

Approximation and Gradient Descent Training with Neural Networks

Welper, G.

It is well understood that neural networks with carefully hand-picked weights provide powerful function approximation and that they can be successfully trained in over-parametrized regimes. Since over-parametrization ensures zero training error, these two theories are not immediately compatible. Recent work uses the smoothness that is required for approximation results to extend a neural tangent kernel (NTK) optimization argument to an under-parametrized regime and show direct approximation bounds for networks trained by gradient flow. Since gradient flow is only an idealization of a practical method, this paper establishes analogous results for networks trained by gradient descent.

assumption, gradient descent, neural network, (13 more...)

2405.11696

Country:

North America > United States > Florida > Orange County > Orlando (0.14)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.74)

Sun, Youbang, Chen, Shixiang, Garcia, Alfredo, Shahrampour, Shahin

Global Convergence of Decentralized Retraction-Free Optimization on the Stiefel Manifold

arXiv.org Artificial IntelligenceMay-19-2024

Many classical and modern machine learning algorithms require solving optimization tasks under orthogonal constraints. Solving these tasks often require calculating retraction-based gradient descent updates on the corresponding Riemannian manifold, which can be computationally expensive. Recently Ablin et al. proposed an infeasible retraction-free algorithm, which is significantly more efficient. In this paper, we study the decentralized non-convex optimization task over a network of agents on the Stiefel manifold with retraction-free updates. We propose \textbf{D}ecentralized \textbf{R}etraction-\textbf{F}ree \textbf{G}radient \textbf{T}racking (DRFGT) algorithm, and show that DRFGT exhibits ergodic $\mathcal{O}(1/K)$ convergence rate, the same rate of convergence as the centralized, retraction-based methods. We also provide numerical experiments demonstrating that DRFGT performs on par with the state-of-the-art retraction based methods with substantially reduced computational overhead.

algorithm, manifold, optimization, (16 more...)

2405.1159

Country:

North America > United States > Texas > Brazos County > College Station (0.14)
North America > United States > Massachusetts > Suffolk County > Boston (0.04)
Asia > China > Anhui Province > Hefei (0.04)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.67)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.46)
Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Artificial IntelligenceMay-19-2024

Optimization of Worker Scheduling at Logistics Depots Using Genetic Algorithms and Simulated Annealing

Xu, Jinxin, Wu, Haixin, Cheng, Yu, Wang, Liyang, Yang, Xin, Fu, Xintong, Su, Yuelong

The efficient scheduling of permanent and temporary workers is crucial for Improving the efficiency of sortation center management optimizing the efficiency of the logistics depot while has a direct impact on the fulfillment efficiency and minimizing labor usage. The study begins by establishing operational costs of the entire logistics network. Staff a 0-1 integer linear programming model, with decision management in sortation centers is a key challenge. Staffing needs to be adjusted according to the forecasted shipment variables determining the scheduling of permanent and volume to ensure a sufficient workforce to handle the flow of temporary workers for each time slot on a given day. The goods during peak hours while avoiding the wastage of excess objective function aims to minimize person-days, while manpower during low-demand times. Staff scheduling based constraints ensure fulfillment of hourly labor on effective solution algorithms becomes one of the key requirements, limit workers to one time slot per day, cap strategies to improve the efficiency of the sorting center. By consecutive working days for permanent workers, and reasonably allocating regular and temporary workers, the maintain non-negativity and integer constraints. The sorting speed and accuracy can be improved, thus reducing the model is then solved using genetic algorithms and overall logistics cost and improving customer satisfaction.

algorithm, constraint, temporary worker, (12 more...)

2405.11729

Country:

Asia > Russia (0.05)
North America > United States > New York (0.04)
Europe > Russia > Central Federal District > Moscow Oblast > Moscow (0.04)
(2 more...)

Genre: Research Report (0.50)

Industry:

Transportation > Freight & Logistics Services (0.56)
Health & Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.43)

arXiv.org Machine LearningMay-18-2024

Learning from Imperfect Human Feedback: a Tale from Corruption-Robust Dueling

Cheng, Yuwei, Yao, Fan, Liu, Xuefeng, Xu, Haifeng

This paper studies Learning from Imperfect Human Feedback (LIHF), motivated by humans' potential irrationality or imperfect perception of true preference. We revisit the classic dueling bandit problem as a model of learning from comparative human feedback, and enrich it by casting the imperfection in human feedback as agnostic corruption to user utilities. We start by identifying the fundamental limits of LIHF and prove a regret lower bound of $\Omega(\max\{T^{1/2},C\})$, even when the total corruption $C$ is known and when the corruption decays gracefully over time (i.e., user feedback becomes increasingly more accurate). We then turn to design robust algorithms applicable in real-world scenarios with arbitrary corruption and unknown $C$. Our key finding is that gradient-based algorithms enjoy a smooth efficiency-robustness tradeoff under corruption by varying their learning rates. Specifically, under general concave user utility, Dueling Bandit Gradient Descent (DBGD) of Yue and Joachims (2009) can be tuned to achieve regret $O(T^{1-\alpha} + T^{ \alpha} C)$ for any given parameter $\alpha \in (0, \frac{1}{4}]$. Additionally, this result enables us to pin down the regret lower bound of the standard DBGD (the $\alpha=1/4$ case) as $\Omega(T^{3/4})$ for the first time, to the best of our knowledge. For strongly concave user utility we show a better tradeoff: there is an algorithm that achieves $O(T^{\alpha} + T^{\frac{1}{2}(1-\alpha)}C)$ for any given $\alpha \in [\frac{1}{2},1)$. Our theoretical insights are corroborated by extensive experiments on real-world recommendation data.

algorithm, corruption, learning, (14 more...)

2405.11204

Country:

North America > United States > Illinois > Cook County > Chicago (0.04)
North America > United States > Virginia (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Personal Assistant Systems (0.67)
Information Technology > Data Science > Data Mining > Big Data (0.48)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.34)

arXiv.org Artificial IntelligenceMay-17-2024

Nonparametric Teaching of Implicit Neural Representations

Zhang, Chen, Luo, Steven Tin Sui, Li, Jason Chun Lok, Wu, Yik-Chung, Wong, Ngai

We investigate the learning of implicit neural representation (INR) using an overparameterized multilayer perceptron (MLP) via a novel nonparametric teaching perspective. The latter offers an efficient example selection framework for teaching nonparametrically defined (viz. non-closed-form) target functions, such as image functions defined by 2D grids of pixels. To address the costly training of INRs, we propose a paradigm called Implicit Neural Teaching (INT) that treats INR learning as a nonparametric teaching problem, where the given signal being fitted serves as the target function. The teacher then selects signal fragments for iterative training of the MLP to achieve fast convergence. By establishing a connection between MLP evolution through parameter-based gradient descent and that of function evolution through functional gradient descent in nonparametric teaching, we show for the first time that teaching an overparameterized MLP is consistent with teaching a nonparametric learner. This new discovery readily permits a convenient drop-in of nonparametric teaching algorithms to broadly enhance INR training efficiency, demonstrating 30%+ training time savings across various input modalities.

gradient descent, nonparametric teaching, teaching, (13 more...)

2405.10531

Country:

Europe > Austria > Vienna (0.14)
Asia > China > Hong Kong (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
(2 more...)

Genre: Research Report > New Finding (0.48)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Perceptrons (0.54)